| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Pokémon | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Pokémon | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Pokémon | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Pokémon | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Pokémon | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Pokémon | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Pokémon | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Pokémon | 通过 (无共识错误) |
| google/gemma-3-12b-it | Pokémon | 通过 (无共识错误) |
| google/gemma-3-1b-it | Pokémon | 通过 (无共识错误) |
| google/gemma-3-4b-it | Pokémon | 通过 (无共识错误) |
| google/translategemma-12b-it | Pokémon | 通过 (无共识错误) |
| google/translategemma-4b-it | Pokémon | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Pokémon | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Pokémon | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | DoH | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | DoH | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | DoH | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | DoH | 通过 (无共识错误) |
| Qwen/Qwen3-14B | DoH | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | DoH | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | DoH | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | DoH | 通过 (无共识错误) |
| google/gemma-3-12b-it | DoH | 通过 (无共识错误) |
| google/gemma-3-1b-it | DoH | 通过 (无共识错误) |
| google/gemma-3-4b-it | Download |
致命
[准确性]
"Download"
理由: The source 'DoH' (likely DNS over HTTPS or an acronym) is incorrectly translated as 'Download', changing the meaning entirely. | Mistranslation: 'DoH' (DNS over HTTPS) translated as 'Download' - completely wrong meaning | Mistranslation of acronym; 'DoH' is left untranslated in reference but was incorrectly expanded to 'Download'. | Incorrect translation of 'DoH'. | Hypothesis 'Download' does not match source 'DoH' (DNS over HTTPS). Complete mistranslation. |
| google/translategemma-12b-it | DoH | 通过 (无共识错误) |
| google/translategemma-4b-it | DoH | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Mistranslation of acronym 'DoH'; source term is left untranslated in reference but hypothesis invents unrelated meaning 'log in'. | Hypothesis '登录' (login) is completely unrelated to source 'DoH' (DNS over HTTPS). Should be 'DoH' or a Chinese equivalent. | The source 'DoH' (likely DNS over HTTPS or a proper noun) is incorrectly translated as 'Login' ('登录'), which is a complete hallucination and loss of meaning. | Hypothesis is completely unrelated to source - '登录' means 'login' but source is 'DoH' (DNS over HTTPS) |
| tencent/HY-MT1.5-7B | DoH | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | CSDN | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | CSDN | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | CSDN | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | CSDN | 通过 (无共识错误) |
| Qwen/Qwen3-14B | CSDN | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | CSDN | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | CSDN | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | CSDN | 通过 (无共识错误) |
| google/gemma-3-12b-it | CSDN | 通过 (无共识错误) |
| google/gemma-3-1b-it | CSDN | 通过 (无共识错误) |
| google/gemma-3-4b-it | CSDN | 通过 (无共识错误) |
| google/translategemma-12b-it | CSDN | 通过 (无共识错误) |
| google/translategemma-4b-it | CSDN | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Mistranslation; 'CSDN' is a name that should be kept as is, while hypothesis means 'log in'. | Hypothesis '登录' (login) is completely unrelated to source 'CSDN' (Chinese developer platform acronym). Should be 'CSDN'. | The source 'CSDN' (a proper noun for a developer community) is incorrectly translated as 'Login' ('登录'), representing a total failure to translate the entity. | Hypothesis is completely unrelated to source - '登录' means 'login' but source is 'CSDN' (Chinese tech blog platform) |
| tencent/HY-MT1.5-7B | CSDN | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | DALL-E | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | DALL-E | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | DALL-E | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | DALL-E | 通过 (无共识错误) |
| Qwen/Qwen3-14B | DALL-E | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | DALL-E | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | DALL-E | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | DALL-E | 通过 (无共识错误) |
| google/gemma-3-12b-it | DALL-E | 通过 (无共识错误) |
| google/gemma-3-1b-it | DALL-E | 通过 (无共识错误) |
| google/gemma-3-4b-it | DALL-E | 通过 (无共识错误) |
| google/translategemma-12b-it | DALL-E | 通过 (无共识错误) |
| google/translategemma-4b-it | DALL-E | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | DALL-E | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | DALL-E | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | I am sorry for any inconvenience caused |
致命
[准确性]
"I am sorry for any inconvenience caused"
理由: Complete mistranslation - source means 'Chit-chat / Gossip' but hypothesis is an unrelated English apology phrase | Completely unrelated apology; does not translate the source idiom meaning gossip/chit-chat. | Hypothesis is completely unrelated to source meaning. Source '搞七捻三' means chit-chat/gossip, but hypothesis is an apology phrase. | The translation completely misinterprets the source meaning. '搞七捻三' refers to gossiping or meddling, while the hypothesis is a standard apology phrase unrelated to the source. | The translation does not match the source meaning. |
| CohereLabs/tiny-aya-water | mess around |
严重
[准确性]
"mess around"
理由: Hypothesis translation 'mess around' does not match reference 'Chit-chat / Gossip'. The meaning is significantly different. | Mistranslation of idiom;source emphasizes gossip/chit-chat while hypothesis focuses on aimless behavior, losing the nuance of talking/gossiping. | The source '搞七捻三' refers to meddling, making trouble, or creating a mess through unnecessary complications. 'Mess around' implies playing or wasting time, which misses the negative connotation of causing trouble or confusion inherent in the source. | Incorrect translation of '搞七捻三' which means gossip or chit-chat. | Incorrect translation - '搞七捻三' means 'chit-chat/gossip', not 'mess around' |
| Qwen/Qwen2.5-14B-Instruct | To mess around or create trouble |
严重
[准确性]
"To mess around or create trouble"
理由: The translation does not match the reference and may not accurately convey the intended meaning. | The translation misinterprets the source idiom '搞七捻三' (which implies meddling, making a mess of things, or frivolous behavior) as 'creating trouble' or generic 'messing around'. The reference 'Chit-chat / Gossip' suggests a context of frivolous talk or meddling in affairs, whereas the hypothesis leans towards causing disruption, altering the intended nuance significantly. | Mistranslation of idiom; source and reference convey casual chatting/gossip rather than causing trouble. | Translation meaning is significantly different from reference. Hypothesis conveys 'to mess around' while reference indicates 'chit-chat/gossip', which are distinct meanings. | Incorrect translation - source means 'chit-chat/gossip' not 'mess around or create trouble' |
| Qwen/Qwen2.5-7B-Instruct | 搞七捻三 |
严重
[准确性]
"搞七捻三"
理由: The Chinese phrase is left untranslated; required English meaning "chit-chat/gossip" is missing. | The translation does not convey the intended meaning of the source text. | Source text is Chinese, but target language is specified as Chinese. The hypothesis copies the source instead of translating it to the target language (English implied by reference), resulting in a complete failure to translate. | Hypothesis keeps source Chinese text unchanged while reference provides English translation. Target language is zh but translation to English is required. | Source is a Chinese idiom meaning gossip/chit-chat, but hypothesis keeps it untranslated while reference provides the meaning |
| Qwen/Qwen3-14B | mess around |
严重
[准确性]
"mess around"
理由: The translation does not match the source meaning. The correct translation should be 'Chit-chat / Gossip'. | Incorrect translation - '搞七捻三' means chit-chat/gossip, not 'mess around' | The translation 'mess around' implies frivolous behavior or wasting time, whereas the source '搞七捻三' refers to gossiping, chit-chatting, or meddling in affairs. The meaning is significantly distorted. | Mistranslation of idiom;“搞七捻三” refers more specifically to gossiping or chit-chat rather than generic messing around. | Translation does not match reference meaning. 'mess around' is inaccurate for '搞七捻三' which means chit-chat or gossip, not to mess around. |
| Qwen/Qwen3.5-4B | Chaotic and unpredictable behavior |
严重
[准确性]
"Chaotic and unpredictable behavior"
理由: The translation misinterprets the source idiom '搞七捻三' (meaning to meddle, make a fuss, or create unnecessary complications) as 'chaotic and unpredictable behavior', which alters the core meaning. The reference 'Chit-chat / Gossip' is also an interpretation, but the hypothesis is a significant semantic deviation. | Translation does not match reference meaning - reference indicates 'Chit-chat / Gossip' while hypothesis gives 'Chaotic and unpredictable behavior' | The translation does not match the reference and conveys a different meaning. | Mistranslation of idiom; focuses on unpredictability rather than casual chatting/gossip implied by the source and reference. | Incorrect translation of '搞七捻三'. The reference 'Chit-chat / Gossip' is more accurate than 'Chaotic and unpredictable behavior'. |
| Qwen/Qwen3.5-9B | To be confused or to act foolishly |
严重
[准确性]
"To be confused or to act foolishly"
理由: Incorrect translation - source means gossip/chit-chat, not confused/foolish | Mistranslation of the idiom “搞七捻三”, which refers to chit-chatting or engaging in trivial gossip, not confusion or acting foolishly. | Incorrect translation. '搞七捻三' means chit-chat or gossip, not to be confused or act foolishly | The translation misinterprets the source idiom '搞七捻三' (which means to meddle, make a mess, or create unnecessary complications) as 'being confused' or 'acting foolishly'. The reference 'Chit-chat / Gossip' is also slightly off, but the hypothesis is a clear semantic error regarding the action described. | The translation does not match the reference and does not accurately convey the meaning of the source text. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Seven and Three |
严重
[准确性]
"Seven and Three"
理由: Mistranslation - source idiom '搞七捻三' means gossip/chit-chat, not literal 'Seven and Three' | Literal word-for-word rendering that does not convey the idiomatic meaning of “搞七捻三” (chit-chat / gossip), leading to mistranslation. | Completely incorrect translation. Source means 'chit-chat/gossip' but hypothesis literally translates numbers without capturing meaning. | Literal translation of an idiom results in loss of meaning; the source '搞七捻三' means gossiping or messing around, not the numbers seven and three. | Incorrect translation of '搞七捻三' which means gossip or chit-chat. |
| google/gemma-3-12b-it | A little bit; slightly |
严重
[准确性]
"A little bit; slightly"
理由: The translation does not match the source meaning. The correct translation should be related to chit-chat or gossip. | Mistranslation of the idiom “搞七捻三”, which means chatting/gossiping or making a fuss, not expressing degree like “a little bit; slightly”. | Translation does not match reference meaning - reference means 'Chit-chat / Gossip' but hypothesis translates as 'A little bit; slightly' | The translation incorrectly interprets the idiom '搞七捻三' (meaning to meddle, make trouble, or gossip) as 'a little bit/slightly', resulting in a complete loss of meaning. | Translation is completely incorrect. '搞七捻三' means 'chit-chat/gossip', not 'a little bit; slightly' |
| google/gemma-3-1b-it | 搞七捻三 |
严重
[准确性]
"搞七捻三"
理由: Source text is Chinese, but target language is specified as Chinese. The hypothesis copies the source instead of translating it to English (as implied by the reference 'Chit-chat / Gossip'). This is a failure to translate. | The Chinese slang phrase meaning idle chat/gossip is left untranslated; target should convey the meaning rather than copy the source. | Source is a Chinese idiom meaning gossip/chit-chat, but hypothesis keeps it untranslated instead of providing the meaning 'Chit-chat / Gossip' | Hypothesis is in Chinese while target language is zh but reference is in English. The hypothesis should be translated to English as 'Chit-chat / Gossip' or similar, not kept in source language. | The hypothesis repeats the source text without translating it, while the reference provides a translation. |
| google/gemma-3-4b-it | Pull off a scam |
严重
[准确性]
"Pull off a scam"
理由: Mistranslation: '搞七捻三' means 'chit-chat/gossip' not 'pull off a scam' - completely wrong meaning | The source '搞七捻三' implies meddling, gossiping, or making a fuss, not committing a scam. The translation misrepresents the semantic meaning. | Hypothesis 'Pull off a scam' does not match reference 'Chit-chat / Gossip'. Incorrect meaning of source phrase. | Incorrect translation of '搞七捻三'. | Idiomatic meaning is wrong; source implies chatting/gossiping aimlessly, not committing fraud. |
| google/translategemma-12b-it | Something is wrong. |
致命
[准确性]
"Something is wrong."
理由: Mistranslation of the idiom “搞七捻三”, which means chit-chat/gossip/fooling around; the hypothesis gives an unrelated statement. | The hypothesis fails to translate the source text '搞七捻三' (meaning chit-chat/gossip/messing around) and instead outputs an unrelated meta-comment indicating a generation failure. | Hypothesis does not translate the source term. Should be 'Chit-chat / Gossip' or similar meaning, not a meta-comment about translation failure. | Complete mistranslation - source means 'Chit-chat / Gossip' but hypothesis says 'Something is wrong.' | The translation does not match the source meaning. |
| google/translategemma-4b-it | Home |
致命
[准确性]
"Home"
理由: Complete mistranslation; source means 'messy/gossip' but hypothesis translates as 'Home'. | Completely incorrect translation. Source '搞七捻三' means 'Chit-chat / Gossip', not 'Home' | Incorrect translation of '搞七捻三' to 'Home' instead of 'Chit-chat / Gossip' | Mistranslation of “搞七捻三”, which means ‘chit-chat / gossip’ or ‘messing around’, not ‘Home’. | Complete mistranslation - source means 'chit-chat/gossip' but hypothesis translates to unrelated word 'Home' |
| tencent/HY-MT1.5-1.8B | Translation of '搞七捻三' in en |
严重
[准确性]
"Translation of '搞七捻三' in en"
理由: The model output is a meta-commentary describing the task instead of providing the actual translation. The target language should be Chinese, but the output is English text describing the translation process. | Untranslated placeholder text; does not convey the meaning of the idiom 'chit-chat/gossip'. | Hypothesis is a meta-description rather than an actual translation. Should provide the translation 'Chit-chat / Gossip' or equivalent. | Partial translation provided instead of the actual term. | Hypothesis is a placeholder template rather than actual translation; reference provides correct meaning 'Chit-chat / Gossip' |
| tencent/HY-MT1.5-7B | Play with twists and turns |
严重
[准确性]
"Play with twists and turns"
理由: The translation is a literal interpretation of the characters that fails to convey the actual idiomatic meaning of 'making trouble' or 'causing a fuss' implied by the source and reference. | Incorrect translation of '搞七捻三'. The hypothesis translates it as 'Play with twists and turns' but the reference correctly identifies it as 'Chit-chat / Gossip', which is the actual meaning of this Chinese idiom. | The translation does not accurately convey the meaning of the source text, which refers to gossip or chit-chat. | Mistranslation of idiom; original means casual chatting or gossiping, not playing with twists and turns. | Mistranslation - source idiom '搞七捻三' means gossip/chit-chat, not 'play with twists and turns' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Board for development |
轻微
[流畅性]
"Board for development"
理由: Word order is awkward. Reference 'Development Boards' is more natural English phrasing than 'Board for development'. | Unnatural phrasing; should be 'development board' or 'development boards' to match common terminology. | The translation is less common than the reference. |
| CohereLabs/tiny-aya-water | Development Board |
轻微
[准确性]
"Development Board"
理由: The hypothesis uses the singular form while the reference uses the plural, which may indicate a slight inaccuracy. | Hypothesis uses singular 'Board' while reference uses plural 'Boards'. Source '开发板' can be singular or plural, but reference indicates plural form is preferred. | Number mismatch with plural reference; should be plural in this context. |
| Qwen/Qwen2.5-14B-Instruct | Development Board | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Development Board |
轻微
[准确性]
"Development Board"
理由: Source text is Chinese, but target language is specified as Chinese. The hypothesis translates into English instead of retaining or processing in the target language, contradicting the language direction. | Singular vs. plural mismatch with reference "Development Boards"; slight loss of number information. | Hypothesis uses singular 'Board' while reference uses plural 'Boards' |
| Qwen/Qwen3-14B | Development Board | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Development Board | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Development Board | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Development Board |
轻微
[准确性]
"Development Board"
理由: Number disagreement; source '开发板' is often used as a general category or plural in UI contexts, while hypothesis uses singular. | Singular form used instead of plural 'Development Boards' as in reference. | Singular form used where the reference indicates plural “Development Boards”; may cause a slight mismatch in intended scope. |
| google/gemma-3-12b-it | Development Board |
轻微
[准确性]
"Development Board"
理由: Source “开发板” in this context refers to the plural concept “Development Boards”; singular vs. plural mismatch with the reference. | The translation is missing the plural form. The correct translation should be 'Development Boards'. | The source '开发板' is a general concept often implying plural in UI contexts; the singular 'Board' may be slightly less accurate than the reference 'Boards', though acceptable. |
| google/gemma-3-1b-it | Development board |
轻微
[准确性]
"Development board"
理由: Hypothesis uses singular 'board' while reference uses plural 'Boards', which is the more standard terminology. | Source text is Chinese, but target language is specified as Chinese. The hypothesis translates into English instead of remaining in Chinese or translating appropriately if the direction was misunderstood. Given the reference is English, the task seems to be Zh->En, but the prompt says Target Language: zh. If Target is Zh, this is a critical language error. If the task is Zh->En based on references, the translation is accurate. However, strictly following 'Target Language: zh', outputting English is a critical error. Assuming the prompt's 'Target Language: zh' is the constraint, outputting English is wrong. | Source is Chinese and target is English; expected translation into Chinese for zh target language content. |
| google/gemma-3-4b-it | Development Board |
轻微
[准确性]
"Development Board"
理由: Number mismatch with plural 'Development Boards' in reference; meaning is mostly preserved but not fully aligned. | Missing plural form compared to reference. | The translation uses singular 'Board' while the reference uses plural 'Boards'. Without context, singular is acceptable, but it deviates from the preferred terminology if the source implies a category. |
| google/translategemma-12b-it | Development board |
轻微
[术语]
"Development board"
理由: The translation uses a singular form while the reference uses plural. | Singular form used instead of plural 'Development Boards' as in reference. | Singular form used while reference uses plural 'Development Boards' |
| google/translategemma-4b-it | Development Board |
轻微
[准确性]
"Development Board"
理由: Number disagreement; source implies plural context matching reference 'Development Boards', hypothesis uses singular. | Singular form used instead of plural. Reference uses 'Development Boards' | Singular used instead of the reference plural “Development Boards”; meaning largely preserved but number is off. |
| tencent/HY-MT1.5-1.8B | Development Board | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Development Board |
轻微
[准确性]
"Development Board"
理由: Number mismatch; source and reference imply plural "Boards" in this UI context, but hypothesis uses singular. | The translation is missing the plural form 's' which is present in the reference. | Singular form 'Development Board' used instead of plural 'Development Boards' as in reference. The source '开发板' can be plural in context. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Edge | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Edge | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Edge | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Edge | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Edge | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Edge | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Edge | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Edge | 通过 (无共识错误) |
| google/gemma-3-12b-it | Edge | 通过 (无共识错误) |
| google/gemma-3-1b-it | Edge | 通过 (无共识错误) |
| google/gemma-3-4b-it | Refresh |
致命
[准确性]
"Refresh"
理由: Incorrect translation of 'Edge'. | Mistranslation: 'Edge' (browser) translated as 'Refresh' - completely wrong meaning | Completely incorrect term; 'Edge' (browser/product name) is not translated as 'Refresh'. | The source 'Edge' (likely a proper noun or specific term) is completely mistranslated as 'Refresh'. | Hypothesis 'Refresh' does not match source 'Edge'. Complete mistranslation. |
| google/translategemma-12b-it | Edge | 通过 (无共识错误) |
| google/translategemma-4b-it | Edge | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Edge | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Edge | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Please provide us with your contact information. |
致命
[准确性]
"Please provide us with your contact information."
理由: The translation is a complete hallucination. The source means 'Happy 2nd Anniversary', but the hypothesis asks for contact information. | Meaning is entirely different from the source phrase 'Happy 2nd Anniversary'. | The translation does not match the source meaning. | Complete mistranslation - source means 'Happy 2nd Anniversary' but hypothesis is an unrelated request for contact information | Hypothesis is completely unrelated to source. Source '二周年快乐' means 'Happy 2nd Anniversary', but hypothesis is a request for contact information. |
| CohereLabs/tiny-aya-water | Happy second anniversary | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Happy Second Anniversary |
轻微
[风格]
"Happy Second Anniversary"
理由: Reference uses abbreviated form '2nd' which is more natural in English | While 'Second' is accurate, the reference '2nd' is the standard stylistic convention for anniversary celebrations in this context. The deviation is minor but affects naturalness. | Uses 'Second Anniversary' instead of '2nd Anniversary'. Reference uses abbreviated ordinal form which is more concise and natural. | The translation uses words instead of numbers, which may not align with the reference's style. |
| Qwen/Qwen2.5-7B-Instruct | Happy 2nd Anniversary | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Happy 2nd Anniversary | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Happy Second Anniversary |
轻微
[风格]
"Happy Second Anniversary"
理由: Uses 'Second' instead of '2nd' as in reference. '2nd' is more concise and commonly used. | While 'Second' is correct, '2nd' (as in the reference) is the standard stylistic convention for anniversary celebrations in this context. | Uses 'Second' while reference uses '2nd' - numeral form preferred in this context |
| Qwen/Qwen3.5-9B | Happy Second Anniversary |
轻微
[风格]
"Happy Second Anniversary"
理由: Uses 'Second' instead of '2nd'. Reference uses '2nd Anniversary' which is more concise | The source '二周年' often corresponds to '2nd' in celebratory contexts (as seen in the reference). 'Second' is accurate but slightly less idiomatic for anniversary banners/headlines compared to the ordinal number '2nd'. | The translation uses 'Second' instead of the more common '2nd' as in the reference. | Reference uses abbreviation '2nd' which is more natural in English |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Happy Two Years! |
轻微
[风格]
"Happy Two Years!"
理由: Unnatural phrasing for an anniversary celebration; 'Happy 2nd Anniversary' is the standard convention. | Understandable and close in meaning but less idiomatic than “Happy 2nd Anniversary” in English and omits the notion of ‘anniversary’ explicitly. | Less formal and accurate than reference 'Happy 2nd Anniversary'. | Reference uses abbreviation '2nd' which is more natural for anniversary context |
| google/gemma-3-12b-it | Happy 2nd Anniversary | 通过 (无共识错误) |
| google/gemma-3-1b-it | Happy two-year anniversary! | 通过 (无共识错误) |
| google/gemma-3-4b-it | Happy Second Anniversary |
轻微
[风格]
"Happy Second Anniversary"
理由: While 'Second' is correct, '2nd' (as in the reference) is often preferred for anniversaries in UI contexts for brevity, though this is a minor stylistic preference. | Style variation: '2nd' is more common abbreviation than 'Second' in anniversary context | Hypothesis uses 'Second Anniversary' while reference uses '2nd Anniversary'. Style variation in ordinal representation. |
| google/translategemma-12b-it | Happy 2nd Anniversary | 通过 (无共识错误) |
| google/translategemma-4b-it | Reload |
致命
[准确性]
"Reload"
理由: Incorrect translation of '二周年快乐' to 'Reload' instead of 'Happy 2nd Anniversary' | Completely incorrect translation. Source '二周年快乐' means 'Happy 2nd Anniversary', not 'Reload' | Complete mistranslation - source means 'Happy 2nd Anniversary' but hypothesis translates to unrelated word 'Reload' | Completely wrong meaning; “二周年快乐” means “Happy 2nd Anniversary”, not “Reload”. | Complete mistranslation; source means 'Happy 2nd Anniversary' but hypothesis translates as 'Reload'. |
| tencent/HY-MT1.5-1.8B | Translation of '二周年快乐' in en |
严重
[准确性]
"Translation of '二周年快乐' in en"
理由: Hypothesis is a meta-description rather than an actual translation. Should provide 'Happy 2nd Anniversary' or equivalent. | Untranslated placeholder text; does not express 'Happy 2nd Anniversary'. | The model output is a meta-commentary describing the task instead of providing the actual translation. It fails to translate the source text into the target language. | Partial translation provided instead of the actual term. | Hypothesis is a placeholder template rather than actual translation; reference provides correct translation 'Happy 2nd Anniversary' |
| tencent/HY-MT1.5-7B | Happy second anniversary | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Tavily | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Tavily | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Tavily | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Tavily | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Tavily | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Tavily | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Tavily | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Login |
致命
[准确性]
"Login"
理由: Completely incorrect. Hypothesis is 'Login' but source is 'Tavily' (a proper noun/brand name). | Incorrect translation of 'Tavily'. | Complete mistranslation - 'Login' has no relation to source 'Tavily' | Complete mistranslation; 'Tavily' is a proper noun (brand name) and should not be translated to 'Login'. | Completely incorrect rendering; ‘Login’ is unrelated to the proper noun ‘Tavily’, resulting in a wrong term. |
| google/gemma-3-12b-it | Tavily | 通过 (无共识错误) |
| google/gemma-3-1b-it | Tavily | 通过 (无共识错误) |
| google/gemma-3-4b-it | Tavily | 通过 (无共识错误) |
| google/translategemma-12b-it | Tavily | 通过 (无共识错误) |
| google/translategemma-4b-it | Tavily | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Tavily | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Tavily | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Google-Gemini | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Google-Gemini | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Google-Gemini |
轻微
[风格]
"Google-Gemini"
理由: The hyphen in 'Google-Gemini' may not be appropriate as per the reference. | Uses hyphen instead of space. Reference uses 'Google Gemini' with a space, which is the standard formatting for this product name. | The hypothesis retains the hyphen from the source, but the official branding and reference use a space ('Google Gemini'). This is a minor formatting/style error regarding proper noun usage. |
| Qwen/Qwen2.5-7B-Instruct | Google-Gemini |
轻微
[风格]
"Google-Gemini"
理由: Hypothesis uses hyphen 'Google-Gemini' while reference uses space 'Google Gemini' | The translation includes a hyphen which is not present in the reference. | Reference uses 'Google Gemini' without hyphen while hypothesis uses 'Google-Gemini' with hyphen | Source text is Chinese, but target language is specified as Chinese. The hypothesis retains English source text. While proper nouns are often kept, the direction implies translation or localization. However, since the reference also keeps it English, this is acceptable, but the hyphen usage differs from reference. No error based on reference match, but strictly speaking, if target is ZH, English output is an issue unless it's a named entity. Given reference is English, no error. |
| Qwen/Qwen3-14B | Google-Gemini |
轻微
[风格]
"Google-Gemini"
理由: The translation should match the reference, which is 'Google Gemini' without the hyphen. | Hyphen present in hypothesis but not in reference 'Google Gemini' | The hypothesis retains a hyphen ('Google-Gemini') which is not standard for this brand name, whereas the reference correctly uses a space ('Google Gemini'). This is a minor punctuation/style error regarding proper noun formatting. |
| Qwen/Qwen3.5-4B | Google-Gemini |
轻微
[风格]
"Google-Gemini"
理由: Uses hyphen instead of space. Reference uses 'Google Gemini' without hyphen. | Includes hyphen while reference uses 'Google Gemini' without hyphen | The hyphen in the translation does not match the reference. | The hypothesis retains the hyphen from the source, but the standard brand styling (and reference) is 'Google Gemini' without a hyphen. |
| Qwen/Qwen3.5-9B | Google Gemini | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Google-Gemini |
轻微
[风格]
"Google-Gemini"
理由: Incorrect hyphenation compared to reference 'Google Gemini'. | Incorrect punctuation; the standard branding is 'Google Gemini' without a hyphen. | Hyphen included in hypothesis but reference uses space 'Google Gemini' |
| google/gemma-3-12b-it | Google-Gemini |
轻微
[风格]
"Google-Gemini"
理由: The hypothesis retains a hyphen ('Google-Gemini') whereas the standard branding and reference use a space ('Google Gemini'). | Should use space instead of hyphen: 'Google Gemini' to match reference | Hyphen usage differs from reference - reference uses 'Google Gemini' without hyphen | The hyphen in 'Google-Gemini' is not present in the reference 'Google Gemini'. |
| google/gemma-3-1b-it | Google Gemini | 通过 (无共识错误) |
| google/gemma-3-4b-it | Save |
致命
[准确性]
"Save"
理由: Mistranslation: 'Google-Gemini' translated as 'Save' - completely wrong meaning | Hypothesis 'Save' does not match source 'Google-Gemini'. Complete mistranslation. | Totally unrelated to the source; 'Google-Gemini' should refer to the product name, not the action 'Save'. | The source 'Google-Gemini' is a proper noun and is completely mistranslated as 'Save'. | Incorrect translation of 'Google-Gemini'. |
| google/translategemma-12b-it | Google-Gemini |
轻微
[风格]
"Google-Gemini"
理由: Hyphen used instead of space. Reference uses 'Google Gemini' without hyphen. | The translation includes a hyphen while the reference does not. | Hyphen present in hypothesis but not in reference 'Google Gemini' |
| google/translategemma-4b-it | Google-Gemini |
轻微
[风格]
"Google-Gemini"
理由: Hyphen present in hypothesis but not in reference 'Google Gemini' | Punctuation error; standard branding uses a space 'Google Gemini' rather than a hyphen. | Hyphen used instead of space. Reference uses 'Google Gemini' |
| tencent/HY-MT1.5-1.8B | Google-Gemini | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Google-Gemini | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Error |
致命
[准确性]
"Error"
理由: Hypothesis is completely incorrect. Source '站务' means 'Site Operations', not 'Error'. | Does not convey the meaning of '站务' (site operations/site affairs) and instead returns an unrelated word. | Complete mistranslation - source means 'Site Operations' but hypothesis is just 'Error' | The translation does not match the source meaning. | The source text '站务' (Station Affairs/Site Operations) is completely mistranslated as 'Error', which is unrelated to the source meaning. |
| CohereLabs/tiny-aya-water | Station Affairs |
严重
[准确性]
"Station Affairs"
理由: The source text '站务' refers to website or platform administration operations, not physical train station matters. 'Station Affairs' is a mistranslation of the domain context. | Literal but wrong domain; here ‘站务’ refers to website/site operations, not a physical station. | Hypothesis 'Station Affairs' does not match reference 'Site Operations'. These are significantly different translations of '站务'. | Less accurate than 'Site Operations'. |
| Qwen/Qwen2.5-14B-Instruct | Station Management |
严重
[准确性]
"Station Management"
理由: Translation is inaccurate. 'Station Management' does not convey the meaning of 'Site Operations'. The reference term better captures the intended meaning. | The translation does not match the reference. | ‘站务’ in web/forum context refers to site operations/administration, not physical station management. | The term '站务' in an internet community context refers to 'Site Operations' or 'Forum Administration'. Translating '站' as 'Station' (implying a physical train or police station) is a mistranslation of the domain-specific meaning. |
| Qwen/Qwen2.5-7B-Instruct | Station staff |
严重
[准确性]
"Station staff"
理由: Source text is Chinese, target language is Chinese. Hypothesis translates to English. 'Station staff' is also a mistranslation of '站务' in an internet context (should be Site Administration/Operations). The primary error is wrong target language. | Hypothesis translates as 'Station staff' but reference is 'Site Operations', which is more accurate for the Chinese term 站务 | The translation does not accurately convey the meaning of the source text. | Mistranslation: "站务" in this context means "site operations" or "site affairs", not personnel/"station staff". |
| Qwen/Qwen3-14B | Station Affairs |
严重
[准确性]
"Station Affairs"
理由: The term '站务' in an internet context refers to website administration or site operations. Translating '站' as 'Station' (implying a physical train station or broadcast station) is a mistranslation of the domain context. | The translation should match the reference, which is 'Site Operations'. | Inaccurate translation. '站务' should be translated as 'Site Operations' not 'Station Affairs'. | Domain misinterpretation; here “站务” refers to website/site operations, not physical station affairs. |
| Qwen/Qwen3.5-4B | Station Staff |
严重
[准确性]
"Station Staff"
理由: Incorrect translation. '站务' means 'Site Operations' not 'Station Staff'. | The translation does not match the reference. | Incorrect translation - reference means 'Site Operations' not 'Station Staff' | The term '站务' in an internet context refers to 'Site Operations' or 'Site Administration'. Translating '站' as 'Station' implies a physical transport station, which is a mistranslation of the domain. | "站务" in this context refers to site/forum operations/administration, not physical station staff. |
| Qwen/Qwen3.5-9B | Station Staff |
严重
[准确性]
"Station Staff"
理由: In the context of internet communities (implied by other terms like CSDN, Yuque), '站务' refers to 'Site Administration' or 'Site Operations'. 'Station Staff' incorrectly interprets '站' as a physical train/bus station. | Incorrect translation. '站务' means Site Operations, not Station Staff | “站务” in an online/community context refers to site or forum operations/administration, not physical station staff. | The translation does not match the reference and may not fully capture the intended meaning. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Station staff |
严重
[准确性]
"Station staff"
理由: Wrong sense of '站'; in an internet context, '站务' refers to 'Site Operations' or administration, not physical station staff. | Incorrect translation. Should be 'Site Operations' not 'Station staff'. | Misinterprets “站务”, which here refers to ‘site operations’/administration rather than physical station staff. | Inaccurate translation - '站务' means site operations, not station staff | Less accurate than reference 'Site Operations'. |
| google/gemma-3-12b-it | Site Management |
轻微
[术语]
"Site Management"
理由: While 'Management' is a valid translation for '务', 'Site Operations' is the more standard industry term for '站务' in community/platform contexts. | Term choice differs from reference - reference uses 'Site Operations' while hypothesis uses 'Site Management' | Should be 'Site Operations' to match reference |
| google/gemma-3-1b-it | Station Service |
严重
[术语]
"Station Service"
理由: Hypothesis uses 'Station Service' but reference uses 'Site Operations' - different terminology | The hypothesis uses 'Station Service' while the reference uses 'Site Operations'. | Hypothesis 'Station Service' is incorrect. Reference 'Site Operations' is the proper translation for '站务' in this context. |
| google/gemma-3-4b-it | Administrative Affairs |
严重
[准确性]
"Administrative Affairs"
理由: Hypothesis 'Administrative Affairs' does not match reference 'Site Operations'. Incorrect translation of source term. | Does not match reference 'Site Operations'. | '站务' in this context refers specifically to site/forum operations; 'Administrative Affairs' is overly generic and misses the site-related nuance. |
| google/translategemma-12b-it | Site Management |
轻微
[术语]
"Site Management"
理由: Different but synonymous term - 'Management' vs 'Operations' | The translation uses 'Management' while the reference uses 'Operations'. | Translation differs from reference. 'Site Management' vs 'Site Operations' - different terminology choice. |
| google/translategemma-4b-it | Administrative Affairs | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '站务' in en |
严重
[准确性]
"Translation of '站务' in en"
理由: Hypothesis is a placeholder template rather than actual translation; reference provides correct translation 'Site Operations' | The model output is a meta-commentary describing the task instead of providing the actual translation. | Hypothesis is a meta-description rather than an actual translation. Should provide 'Site Operations' or equivalent. | Partial translation provided instead of the actual term. | Untranslated placeholder text; fails to convey 'Site Operations'. |
| tencent/HY-MT1.5-7B | Station Services |
严重
[准确性]
"Station Services"
理由: The translation 'Station Services' does not fully capture the meaning of the source text '站务', which refers to site operations. | Incorrect translation of '站务'. The hypothesis translates it as 'Station Services' but the reference correctly identifies it as 'Site Operations', which is the proper term in this context. | "站务" in web/forum context refers to site operations/administration, not physical station services. | Mistranslation of '站' (website/platform) as 'Station' (transport/physical), leading to incorrect terminology compared to 'Site Operations'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Notice | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Announcement | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Announcement | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Announcement |
轻微
[准确性]
"Announcement"
理由: Singular instead of plural "Announcements" leads to slight number mismatch. | Hypothesis uses singular 'Announcement' while reference uses plural 'Announcements' | Source text is Chinese, target language is Chinese. Hypothesis outputs English instead of Chinese. |
| Qwen/Qwen3-14B | Announcement | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Announcement | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Announcement | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Notice |
轻微
[准确性]
"Notice"
理由: Singular form used instead of plural 'Announcements' as in reference. | Less accurate than reference 'Announcements'. | Uses singular ‘Notice’ where ‘公告’ in this context matches plural ‘Announcements’; slight scope mismatch. |
| google/gemma-3-12b-it | Announcement |
轻微
[准确性]
"Announcement"
理由: The translation is missing the plural form. The correct translation should be 'Announcements'. | The source '公告' often refers to a section or multiple notices; the singular 'Announcement' is slightly less precise than the plural 'Announcements' given the reference. | Singular vs. plural: context and reference suggest a category label meaning “Announcements” rather than a single announcement. |
| google/gemma-3-1b-it | Announcement |
轻微
[准确性]
"Announcement"
理由: Singular vs plural mismatch with reference suggests this is a section label; impact is small but could affect navigation semantics. | Source text is Chinese, target language specified as Chinese. Hypothesis is in English. Contradicts target language constraint. Also, singular vs plural ('Announcements') might be a nuance, but the language mismatch is the primary issue. | Hypothesis uses singular 'Announcement' while reference uses plural 'Announcements', which is more standard for UI labels. |
| google/gemma-3-4b-it | Verification Code |
致命
[准确性]
"Verification Code"
理由: Incorrect translation of '公告'. | Hypothesis 'Verification Code' does not match source '公告' (Announcements). Complete mistranslation. | Wrong concept; '公告' means 'Announcements' or 'Notices', not verification codes. | The source '公告' means 'Announcement', but it is translated as 'Verification Code', which is completely unrelated. | Mistranslation: '公告' (announcements) translated as 'Verification Code' - completely wrong meaning |
| google/translategemma-12b-it | Announcement |
轻微
[术语]
"Announcement"
理由: Singular form used instead of plural 'Announcements' as in reference. | Singular form used while reference uses plural 'Announcements' | The translation uses singular while the reference uses plural. |
| google/translategemma-4b-it | Language |
致命
[准确性]
"Language"
理由: Complete mistranslation; source means 'Announcements' but hypothesis translates as 'Language'. | Complete mistranslation - source means 'Announcements' but hypothesis translates to unrelated word 'Language' | “公告” means ‘Announcements’; “Language” is an unrelated mistranslation. | Completely incorrect translation. Source '公告' means 'Announcements', not 'Language' | Incorrect translation of '公告' to 'Language' instead of 'Announcements' |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Mistranslation; '公告' means 'Announcements', while hypothesis means 'log in'. | Hypothesis '登录' (login) is completely unrelated to source '公告' (announcements). Should be 'Announcements'. | Hypothesis is completely unrelated to source - '登录' means 'login' but source is '公告' (announcements) | The source '公告' (Announcements) is incorrectly translated as '登录' (Login). This is a severe accuracy error where the meaning is completely changed. |
| tencent/HY-MT1.5-7B | Announcement |
轻微
[准确性]
"Announcement"
理由: Number mismatch; reference uses plural "Announcements" while hypothesis is singular. | The translation is missing the plural form 's' which is present in the reference. | Singular form 'Announcement' used instead of plural 'Announcements' as in reference. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Please enter your email address. |
致命
[准确性]
"Please enter your email address."
理由: Hypothesis is completely unrelated to source. Source is about 'Bros turning into Hanlin scholars, let's compile books together', but hypothesis is a request for email address. | Complete mistranslation - source means 'Bros turning into Hanlin scholars, let's compile books together' but hypothesis is an unrelated request for email | The translation does not match the source meaning. | The hypothesis is a complete hallucination unrelated to the source text '佬友化身翰林学士,一起来编书了' (Bros turning into Hanlin scholars, let's compile books together). | Completely unrelated to the source sentence about becoming Hanlin scholars and compiling books together. |
| CohereLabs/tiny-aya-water | The scholar of Hanlin Academy, a friend who embodies the spirit of friendship, let's write books together |
严重
[准确性]
"a friend who embodies the spirit of friendship"
理由: Paraphrases and expands ‘佬友’ incorrectly; original means bros/buddies, not this abstract phrase. | Hypothesis significantly misinterprets the source. '佬友化身翰林学士' should be 'Bros turning into Hanlin scholars' not 'The scholar of Hanlin Academy, a friend who embodies the spirit of friendship'. The translation is inaccurate and verbose. | Severe mistranslation of '佬友' (slang for 'bros' or 'fellow users'). The hypothesis incorrectly interprets '友' as 'spirit of friendship' and adds unnecessary explanatory clauses, losing the casual tone. | Overly verbose and less accurate than the reference. | Over-interpreted and mistranslated - '佬友' means 'bros/friends', not 'a friend who embodies the spirit of friendship'; '编书' means 'compile books', not 'write books' |
| Qwen/Qwen2.5-14B-Instruct | Friend transforms into a literary scholar, let's compile books together |
轻微
[风格]
"Friend transforms into a literary scholar"
理由: The translation uses 'Friend' instead of 'Bros', which may not align with the reference's style. | The source '佬友' is a specific community slang term (combining 'old hand/expert' and 'friend'). Translating it as generic 'Friend' loses the specific cultural nuance and camaraderie implied by 'Bros' or 'fellow experts' in the reference. | Loses the colloquial, internet-bro tone of ‘佬友’ and specific cultural reference to ‘翰林学士’; register is more neutral/formal than playful source. | Reference uses more colloquial 'Bros' instead of 'Friend' - slight tone difference |
| Qwen/Qwen2.5-7B-Instruct | Old friends transform into palace scholars, let's start writing books together |
严重
[准确性]
"Old friends"
理由: "佬友" is slang closer to "bros/buddies"; "old friends" changes the colloquial meaning. | Source text is Chinese, target language is Chinese. Hypothesis outputs English instead of Chinese. | Hypothesis translates 佬友 as 'Old friends' and 翰林学士 as 'palace scholars', but reference uses 'Bros' and 'Hanlin scholars' which are more accurate and culturally appropriate |
| Qwen/Qwen3-14B | Lao You transforms into a Hanlin Scholar and joins you in compiling books together. |
轻微
[风格]
"Lao You"
理由: Transliteration loses the colloquial sense of “佬友” as “bros/buddies” used in the reference. | '佬友' is a colloquial term for fellow users or friends in a community. Translating it as 'Lao You' (pinyin) loses the meaning and friendly tone for a target audience unfamiliar with the specific slang, whereas 'Bros' or 'friends' captures the intent better. | Overly literal translation; '佬友' is colloquial 'bros' not formal 'Lao You', and 'joins you' adds nuance not in source | The translation should match the reference, which uses 'Bros'. |
| Qwen/Qwen3.5-4B | The Lao You transforms into a Hanlin Scholar and joins us to compile books. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | The Lao You transforms into an Imperial Scholar and joins you to compile books together. |
轻微
[风格]
"The Lao You transforms into an Imperial Scholar and joins you to compile books together."
理由: Literal and stiff rendering of a playful slogan; loses the informal, collective tone implied by the original (“bros” / “let's”). | Over-interpretation - 'Lao You' is a casual term for friend, 'you' is unnecessary addition, 'Imperial Scholar' is more formal than 'Hanlin' | While '翰林学士' literally means Imperial Scholar, in this playful context, keeping 'Hanlin scholars' (as in reference) might preserve the cultural flavor better, though 'Imperial Scholar' is not strictly wrong.
严重
[准确性]
"The Lao You transforms into an Imperial Scholar"
理由: Incorrect translation of '佬友化身翰林学士'. Should be 'Bros turning into Hanlin scholars' not 'The Lao You transforms into an Imperial Scholar' | The translation does not match the reference and may not fully capture the intended meaning. | '佬友' is a community-specific slang term (likely 'old friends' or 'fellow experts/bros'). Translating it as 'The Lao You' treats it as a proper noun or name, failing to convey the meaning of the community members. 'Bros' in the reference captures the tone better. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Uncle's transformation into a翰林学士, let's come together to compile books. |
严重
[准确性]
"Uncle's"
理由: “佬友” refers to ‘bros’ or ‘friends’, not an uncle or single person; mistranslation of subject. | Incorrect translation of '佬友'. Should be 'Bros' not 'Uncle's'. | Mistranslation of '佬友' (slang for friends/users) as 'Uncle'; failure to translate '翰林学士' (Hanlin scholar) leaving Chinese characters in English text.
轻微
[流畅性]
"transformation into a翰林学士"
理由: Awkward phrasing and terminology compared to reference. | Missing space before the Chinese term and awkward phrasing; could be more natural as ‘into Hanlin scholars’. | Mixed language code-switching. Chinese characters '翰林学士' should be translated to 'Hanlin scholar' not left in source language. |
| google/gemma-3-12b-it | Become a scholar and let's compile a book together. |
严重
[准确性]
"compile a book"
理由: The translation does not capture the correct meaning of the source text. The correct translation should include the context of 'bros turning into Hanlin scholars'. | The hypothesis omits the subject '佬友' (Bros/Friends) and changes the meaning from describing a transformation ('turning into') to an imperative or infinitive ('Become'). | Source and reference imply compiling books (plural) as an activity; singular “a book” narrows the intended meaning. | Missing key cultural reference 'Bros' (佬友) and 'Hanlin scholars' (翰林学士). Should be 'Bros turning into Hanlin scholars, let's compile books together' |
| google/gemma-3-1b-it | Friend transforms itself into an imperial scholar and begins to write books. |
严重
[准确性]
"Friend"
理由: '佬友' refers to bros/members of a community, not a generic 'friend'; the nuance and collective sense are lost. | Hypothesis mistranslates '佬友' as 'Friend' and uses 'itself' (incorrect pronoun). Reference correctly translates as 'Bros' and 'Hanlin scholars' is more accurate than 'imperial scholar'. | Source text is Chinese, target language specified as Chinese. Hypothesis is in English. Contradicts target language constraint. The translation itself is okay for Zh->En, but fails the target language constraint. | The hypothesis uses 'transforms itself' and 'write books' while the reference uses 'turning into' and 'compile books'. |
| google/gemma-3-4b-it | Let's collaborate on writing books together as scholarly figures. | 通过 (无共识错误) |
| google/translategemma-12b-it | Become a scholar and let's write a book together! |
严重
[准确性]
"Become a scholar"
理由: Missing cultural reference '佬友' (bros/friends) and '翰林学士' (Hanlin scholars) - translation loses the playful cultural allusion | Significant deviation from reference. Missing 'Bros' (佬友), 'Hanlin scholars' (翰林学士) is simplified to 'scholar', and 'compile books' (编书) is translated as 'write a book'. | Loses the nuance of “佬友化身翰林学士” which refers to bros/buddies becoming Hanlin scholars; omits the “bros/ buddies” aspect and specific cultural reference to Hanlin. | The hypothesis omits the subject '佬友' (Bros/friends) and changes the meaning from describing a transformation ('turning into') to an imperative command ('Become'). |
| google/translategemma-4b-it | Paste |
致命
[准确性]
"Paste"
理由: Complete mistranslation; source is a sentence about scholars compiling books, hypothesis is unrelated command 'Paste'. | Completely incorrect translation. Source '佬友化身翰林学士,一起来编书了' is a full sentence meaning 'Bros turning into Hanlin scholars, let's compile books together', not 'Paste' | Complete mistranslation - source means 'Bros turning into Hanlin scholars, let's compile books together' but hypothesis translates to unrelated word 'Paste' | Source is a full sentence about people becoming Hanlin scholars and compiling books together; “Paste” is entirely unrelated. | Incorrect translation of the source text to 'Paste' instead of the correct reference |
| tencent/HY-MT1.5-1.8B | Lao you transformed into a Hanlin scholar and started writing books together | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Dude transformed into a Hanlin scholar, and now they're working together on writing books. |
轻微
[风格]
"Dude"
理由: Colloquial register differs slightly; less consistent with plural casual tone "Bros" in reference. | 'Dude' is too informal and singular compared to the colloquial but community-focused 'Bros' in the reference. | More verbose than reference; 'Dude' vs 'Bros' and 'working together on writing books' vs 'compile books together' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Privacy and Security | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Privacy and Security | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Privacy and Security |
轻微
[风格]
"Privacy and Security"
理由: Reference uses ampersand '&' instead of 'and' which is more common in UI context | The hypothesis uses 'and' while the reference uses '&'. In UI labels for 'Privacy & Security', the ampersand is the standard stylistic convention. This is a minor stylistic deviation. | The translation uses 'and' instead of '&' as in the reference. | Uses 'and' instead of '&' symbol. Reference uses '&' which is more concise and commonly used in UI/header contexts. The hypothesis is not incorrect but differs from the reference style convention. |
| Qwen/Qwen2.5-7B-Instruct | Privacy Security | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Privacy & Security | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Privacy Security |
轻微
[风格]
"Privacy Security"
理由: Missing ampersand. Reference uses 'Privacy & Security' which is more standard formatting. | The translation does not include the ampersand from the reference. | While understandable, 'Privacy & Security' is the standard collocation in UI/UX contexts. 'Privacy Security' sounds slightly unnatural as a compound noun phrase. |
| Qwen/Qwen3.5-9B | Privacy Security |
轻微
[风格]
"Privacy Security"
理由: Missing ampersand. Reference uses 'Privacy & Security' which is more standard formatting | The ampersand '&' in the reference is omitted in the translation. | Reference uses '&' which is more concise; 'Privacy & Security' is the standard phrasing |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Privacy and Security |
轻微
[风格]
"Privacy and Security"
理由: Reference uses ampersand '&' instead of 'and' | Should use ampersand 'Privacy & Security' as in reference for consistency. | Minor stylistic deviation; 'Privacy & Security' with an ampersand is the preferred convention for UI headers. |
| google/gemma-3-12b-it | Privacy and Security |
轻微
[风格]
"Privacy and Security"
理由: The use of 'and' instead of '&' deviates from the concise style typically used in UI headers as shown in the reference. | Conjunction style differs - reference uses '&' while hypothesis uses 'and' | Should use ampersand 'Privacy & Security' to match reference |
| google/gemma-3-1b-it | Privacy and security | 通过 (无共识错误) |
| google/gemma-3-4b-it | Buy Now |
致命
[准确性]
"Buy Now"
理由: Incorrect translation of '隐私安全'. | The source '隐私安全' means 'Privacy & Security', but it is translated as 'Buy Now', which is a complete hallucination. | Completely incorrect; '隐私安全' means 'Privacy & Security', not a call-to-action to purchase. | Hypothesis 'Buy Now' does not match source '隐私安全' (Privacy & Security). Complete mistranslation. | Mistranslation: '隐私安全' (privacy & security) translated as 'Buy Now' - completely wrong meaning |
| google/translategemma-12b-it | Privacy and Security | 通过 (无共识错误) |
| google/translategemma-4b-it | Privacy and Security |
轻微
[风格]
"Privacy and Security"
理由: Stylistic inconsistency; reference uses ampersand '&' which is standard for UI labels, hypothesis uses 'and'. | Used 'and' instead of '&'. Reference uses 'Privacy & Security' | Uses 'and' while reference uses '&' for consistency |
| tencent/HY-MT1.5-1.8B | Privacy and Security |
轻微
[风格]
"Privacy and Security"
理由: The hypothesis uses 'and' while the reference uses '&'. In UI contexts, '&' is often preferred for brevity, making 'and' a minor stylistic deviation. | Ampersand vs 'and' - minor stylistic variation from reference 'Privacy & Security' | Hypothesis uses 'and' while reference uses '&'. Both are acceptable but reference format with ampersand is more concise and commonly used in UI contexts. |
| tencent/HY-MT1.5-7B | Privacy and Security |
轻微
[风格]
"Privacy and Security"
理由: Reference uses ampersand '&' which is more concise than 'and' | Used 'and' instead of the ampersand '&' found in the reference, which is often preferred for UI labels. | Uses 'and' instead of '&' as in reference. The reference format 'Privacy & Security' is the standard style for this phrase. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Translation of 'activity' in en |
轻微
[准确性]
"Translation of 'activity' in en"
理由: Hypothesis is a meta-description rather than a translation. Source '活动' should translate to 'Events', not a description of the translation process. | The translation does not match the source meaning. | The output is a meta-commentary describing the translation task rather than providing the actual translation of '活动' (Events/Activities). | Meta-text about translation rather than translating the term itself; does not convey the intended meaning 'events' or 'activities' in context. |
| CohereLabs/tiny-aya-water | Translation of '活动' in en |
致命
[准确性]
"Translation of '活动' in en"
理由: Meta-text about translation instead of translating the term; does not convey the meaning ‘Events’ or ‘Activities’. | Output contains source language tag instead of actual translation | Incorrect and nonsensical translation. | The output is a meta-commentary describing the translation task rather than providing the actual translation ('Events'). | Hypothesis is a meta-description rather than an actual translation. Should be 'Events' as per reference. |
| Qwen/Qwen2.5-14B-Instruct | Event | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Activity |
严重
[准确性]
"Activity"
理由: Singular "Activity" less faithfully reflects plural sense "Events" in this context. | Hypothesis translates 活动 as 'Activity' but reference uses 'Events' which is more appropriate in this context | Source text is Chinese, target language is Chinese. Hypothesis outputs English instead of Chinese. |
| Qwen/Qwen3-14B | Activity | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Event | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Event | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Activity |
轻微
[准确性]
"Activity"
理由: Incorrect translation. Should be 'Events' not 'Activity'. | Less accurate than reference 'Events'. | “活动” in this UI/context is better rendered as plural ‘Events’; singular ‘Activity’ slightly shifts nuance. | Singular form used where plural 'Events' is standard for a section listing multiple activities. |
| google/gemma-3-12b-it | Event |
轻微
[准确性]
"Event"
理由: The translation is missing the plural form. The correct translation should be 'Events'. | The source '活动' usually refers to a category or list of events; the singular 'Event' is less accurate than the plural 'Events'. | Likely a section or category label that should be plural (“Events”), as in the reference. |
| google/gemma-3-1b-it | Activity |
严重
[准确性]
"Activity"
理由: Likely a section/category label where 'Events' is more appropriate; meaning overlap but not fully equivalent in this context. | Hypothesis 'Activity' is incorrect. Reference 'Events' is the proper translation for '活动' in this context. | Source text is Chinese, target language specified as Chinese. Hypothesis is in English. Contradicts target language constraint. Also, 'Events' is a better translation for '活动' in this context than 'Activity'. |
| google/gemma-3-4b-it | Activity |
轻微
[准确性]
"Activity"
理由: Singular 'Activity' is less natural for the generic menu/context where '活动' typically means 'Events'; meaning overlap but not fully accurate. | Hypothesis 'Activity' does not match reference 'Events'. Incorrect translation of source term. | Does not match reference 'Events'. |
| google/translategemma-12b-it | Events | 通过 (无共识错误) |
| google/translategemma-4b-it | Exit |
致命
[准确性]
"Exit"
理由: “活动” means ‘Events’ or ‘Activities’; “Exit” is a completely wrong meaning. | Complete mistranslation - source means 'Events' but hypothesis translates to unrelated word 'Exit' | Completely incorrect translation. Source '活动' means 'Events', not 'Exit' | Complete mistranslation; source means 'Events' but hypothesis translates as 'Exit'. | Incorrect translation of '活动' to 'Exit' instead of 'Events' |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Hypothesis '登录' (login) is completely unrelated to source '活动' (events/activities). Should be 'Events'. | Mistranslation; '活动' means 'Events' or 'Activities', while hypothesis means 'log in'. | The source '活动' (Events/Activities) is incorrectly translated as '登录' (Login). This is a complete hallucination and loss of meaning. | Hypothesis is completely unrelated to source - '登录' means 'login' but source is '活动' (events) |
| tencent/HY-MT1.5-7B | Activity |
轻微
[准确性]
"Activity"
理由: Less natural in this context; reference suggests plural "Events" as standard section label. | The translation 'Activity' does not fully capture the meaning of the source text '活动', which refers to events. | Incorrect translation of '活动'. The hypothesis translates it as 'Activity' but the reference correctly identifies it as 'Events', which is the proper term in this context. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Proxy | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Proxy | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Proxy | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Proxy | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Proxy | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Proxy | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Proxy | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Proxy | 通过 (无共识错误) |
| google/gemma-3-12b-it | Proxy | 通过 (无共识错误) |
| google/gemma-3-1b-it | Proxy | 通过 (无共识错误) |
| google/gemma-3-4b-it | New |
致命
[准确性]
"New"
理由: Incorrect translation; 'Proxy' is a technical term unrelated to 'New'. | Hypothesis 'New' does not match source 'Proxy'. Complete mistranslation. | Mistranslation: 'Proxy' translated as 'New' - completely wrong meaning | The source 'Proxy' is a technical term that is completely mistranslated as 'New'. | Incorrect translation of 'Proxy'. |
| google/translategemma-12b-it | Proxy | 通过 (无共识错误) |
| google/translategemma-4b-it | Proxy | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Proxy | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | 代理 |
严重
[准确性]
"代理"
理由: The source is an English proper noun/term ('Proxy') which should remain untranslated or be transliterated in a Chinese context targeting English output, but here the target language is Chinese. However, the reference keeps it as 'Proxy', indicating a preference for the English loanword in this specific technical context rather than the translation '代理'. | Unnecessary translation; target language is specified as zh, and reference keeps English term "Proxy" as terminology item. | The hypothesis provides a Chinese translation '代理' instead of keeping the English term 'Proxy' as in the reference. The source is already in English and should not be translated to Chinese. | Target is in Chinese (translation) but source and reference are English 'Proxy' - should remain as 'Proxy' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Settings |
致命
[准确性]
"Settings"
理由: The source '领英会员' (LinkedIn Member/Premium) is completely mistranslated as 'Settings', which is unrelated. | Hypothesis is completely incorrect. Source '领英会员' means 'LinkedIn Premium', not 'Settings'. | Incorrect meaning; '领英会员' refers to LinkedIn membership/Premium, not to settings. | The translation does not match the source meaning. | Mistranslation - source means 'LinkedIn Premium' but hypothesis is 'Settings' |
| CohereLabs/tiny-aya-water | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: Incorrect translation; should be 'LinkedIn Premium'. | Hypothesis 'LinkedIn Member' does not match reference 'LinkedIn Premium'. These represent different product offerings. | ‘领英会员’ here refers to paid/premium membership; ‘Member’ misses the paid/premium aspect reflected in the reference. |
| Qwen/Qwen2.5-14B-Instruct | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: The translation does not match the reference. | Translation is inaccurate. '领英会员' refers to 'LinkedIn Premium' (a paid subscription tier), not just a generic 'LinkedIn Member'. | Source and reference denote the paid Premium subscription, not just any LinkedIn member. | The source '领英会员' in a commercial context typically refers to the paid subscription service 'LinkedIn Premium'. 'LinkedIn Member' could imply any free user, failing to capture the specific product meaning intended by the reference. |
| Qwen/Qwen2.5-7B-Instruct | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: Hypothesis translates 领英会员 as 'LinkedIn Member' but reference is 'LinkedIn Premium' which is the correct term for paid membership | Source text is Chinese, target language is Chinese. Hypothesis outputs English instead of Chinese. Additionally, 'Member' is less accurate than 'Premium' for '领英会员' in this context. | The translation does not accurately convey the meaning of the source text. |
| Qwen/Qwen3-14B | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: Mistranslation of “会员”; here it refers to paid “Premium” membership, not just being a member. | Inaccurate translation. '领英会员' should be translated as 'LinkedIn Premium' not 'LinkedIn Member'. | The source '领英会员' in a commercial context typically refers to the paid subscription service ('LinkedIn Premium'). 'LinkedIn Member' simply denotes any registered user, failing to convey the specific product/subscription meaning implied by the reference. |
| Qwen/Qwen3.5-4B | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: Incorrect translation. '领英会员' means 'LinkedIn Premium' not 'LinkedIn Member'. | Mistranslation of "会员" here; refers to paid subscription tier "LinkedIn Premium", not a general member. | Incorrect translation - reference means 'LinkedIn Premium' not 'LinkedIn Member' | '领英会员' in this context specifically refers to the paid subscription service 'LinkedIn Premium'. 'LinkedIn Member' could imply any registered user, failing to capture the specific product meaning. | The translation does not match the reference. |
| Qwen/Qwen3.5-9B | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: Incorrect translation. '领英会员' means LinkedIn Premium, not LinkedIn Member | The translation does not match the reference and may not fully capture the intended meaning. | '领英会员' in a commercial context usually refers to the paid subscription service 'LinkedIn Premium'. 'LinkedIn Member' could imply any registered user (which is free), missing the specific product meaning implied by the reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: “领英会员” here refers specifically to the paid plan ‘LinkedIn Premium’, not a generic member; key marketing term mismatch. | Less accurate than reference 'LinkedIn Premium'. | Incorrect terminology; '领英会员' in this context refers to the paid subscription 'LinkedIn Premium', not just any member. | Incorrect translation. Should be 'LinkedIn Premium' not 'LinkedIn Member'. |
| google/gemma-3-12b-it | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: The translation does not match the reference 'LinkedIn Premium'. | The hypothesis translates '领英会员' literally as 'Member', but in this context, it specifically refers to the paid subscription service 'LinkedIn Premium'. | Incorrect translation. '领英会员' means 'LinkedIn Premium', not 'LinkedIn Member' | “领英会员” here refers specifically to the paid subscription “LinkedIn Premium”; translating as generic “LinkedIn Member” loses the “premium/paid” meaning. |
| google/gemma-3-1b-it | LinkedIn Member |
严重
[术语]
"LinkedIn Member"
理由: Hypothesis uses 'Member' but reference uses 'Premium' - different terminology for LinkedIn subscription tier | The hypothesis uses 'Member' while the reference uses 'Premium'. | Hypothesis 'LinkedIn Member' is incorrect. Reference 'LinkedIn Premium' is the proper translation for '领英会员' which refers to the premium membership tier. |
| google/gemma-3-4b-it | LinkedIn Premium Member |
轻微
[准确性]
"LinkedIn Premium Member"
理由: Does not match reference 'LinkedIn Premium'. | Slight semantic shift; '领英会员' here refers to the premium service itself, not specifically the member, though the intent is close. | Hypothesis 'LinkedIn Premium Member' adds 'Member' which is not in reference 'LinkedIn Premium'. |
| google/translategemma-12b-it | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: Incorrect translation. '领英会员' means 'LinkedIn Premium' not 'LinkedIn Member'. | The source '领英会员' in this context refers to the paid subscription service ('LinkedIn Premium'), but the hypothesis translates it literally as 'LinkedIn Member', which usually refers to any free user. | “领英会员” here refers to the paid premium membership; rendering as “Member” misses the premium/paid aspect conveyed in the reference. |
| google/translategemma-4b-it | LinkedIn Premium | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '领英会员' in en |
严重
[准确性]
"Translation of '领英会员' in en"
理由: The model output is a meta-commentary describing the task instead of providing the actual translation. | Hypothesis is a meta-description rather than an actual translation. Should provide 'LinkedIn Premium' or equivalent. | Untranslated placeholder text; does not convey 'LinkedIn Premium'. | Hypothesis is a placeholder template rather than actual translation; reference provides correct translation 'LinkedIn Premium' | Partial translation provided instead of the actual term. |
| tencent/HY-MT1.5-7B | LinkedIn Member |
严重
[准确性]
"LinkedIn Member"
理由: Incorrect translation of '领英会员'. The hypothesis translates it as 'LinkedIn Member' but the reference correctly identifies it as 'LinkedIn Premium', which is the proper product term. | Mistranslation; "会员" here refers to paid subscription "Premium", not generic member. | Incorrect terminology. '领英会员' in this context refers to the paid subscription service 'LinkedIn Premium', not just any 'Member'. | The translation 'LinkedIn Member' does not accurately convey the meaning of the source text '领英会员', which refers to LinkedIn Premium. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | LDC | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | LDC | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | LDC | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | LDC | 通过 (无共识错误) |
| Qwen/Qwen3-14B | LDC | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | LDC | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | LDC | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Login |
致命
[准确性]
"Login"
理由: Incorrectly translated acronym; ‘LDC’ is a specific proper noun and should be kept as ‘LDC’, not replaced by ‘Login’. | Complete mistranslation; 'LDC' is an acronym/proper noun and should not be translated to 'Login'. | Completely incorrect. Hypothesis is 'Login' but source is 'LDC' (acronym/proper noun). | Incorrect translation of 'LDC'. | Complete mistranslation - 'Login' has no relation to source 'LDC' |
| google/gemma-3-12b-it | LDC | 通过 (无共识错误) |
| google/gemma-3-1b-it | LDC | 通过 (无共识错误) |
| google/gemma-3-4b-it | Local Data Cache |
严重
[准确性]
"Local Data Cache"
理由: Mistranslation: 'LDC' is an acronym that should be kept, translated as 'Local Data Cache' which changes the meaning | Invented expansion of acronym; 'LDC' should remain as is, and 'Local Data Cache' is not equivalent. | The source 'LDC' is an acronym kept as is in the reference. Expanding it to 'Local Data Cache' is an assumption that may be incorrect depending on the specific context (e.g., it could stand for something else), violating the preservation of proper nouns/acronyms. | Hypothesis expands acronym 'LDC' to 'Local Data Cache' while reference keeps it as 'LDC'. Unnecessary expansion. | Does not match reference 'LDC'. |
| google/translategemma-12b-it | LDC | 通过 (无共识错误) |
| google/translategemma-4b-it | Local Data Center |
严重
[准确性]
"Local Data Center"
理由: Expands “LDC” to “Local Data Center” without basis; reference keeps it as “LDC”, so this is an incorrect disambiguation. | Expanded acronym instead of keeping it. Source 'LDC' should remain as 'LDC', not be translated to 'Local Data Center' | Unwarranted expansion; source is the acronym 'LDC' which should remain as is per reference, hypothesis incorrectly expands it. |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: The source 'LDC' (an acronym) is incorrectly translated as '登录' (Login). This is a complete hallucination. | Mistranslation; 'LDC' is a name/acronym to retain, while hypothesis means 'log in'. | Hypothesis is completely unrelated to source - '登录' means 'login' but source is 'LDC' (abbreviation) | Hypothesis '登录' (login) is completely unrelated to source 'LDC' (acronym). Should be 'LDC'. |
| tencent/HY-MT1.5-7B | 助手 |
致命
[准确性]
"助手"
理由: Completely incorrect; "LDC" is an acronym and should be preserved, not translated as "assistant". | The hypothesis provides a Chinese translation '助手' instead of keeping the acronym 'LDC' as in the reference. The source is an acronym and should not be translated. | Complete mistranslation - source 'LDC' should remain unchanged, not translated to '助手' (assistant) | Complete hallucination/mistranslation. 'LDC' is an acronym that should be retained as is, not translated to 'Assistant'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Parrot |
致命
[准确性]
"Parrot"
理由: Hypothesis is a literal translation of the Chinese characters but incorrect. '语雀' is a proper noun (Yuque, a knowledge platform), not 'Parrot'. | Literal meaning of characters but incorrect for the proper noun '语雀', which should be transliterated as 'Yuque'. | The translation does not match the source meaning. | Mistranslation - 'Yuque' (语雀) is a brand name, not 'Parrot' | The source '语雀' is a proper noun (brand name Yuque). Translating it literally or incorrectly as 'Parrot' is a severe entity error; it should remain 'Yuque'. |
| CohereLabs/tiny-aya-water | Sparrow |
严重
[术语]
"Sparrow"
理由: '语雀' is a proper noun (brand name) for a knowledge management tool. It should be transliterated as 'Yuque' or kept as is, not translated literally to 'Sparrow'. | Incorrect translation - '语雀' is the brand name 'Yuque', not 'Sparrow' | Incorrect translation; should be 'Yuque'. |
| Qwen/Qwen2.5-14B-Instruct | Yuque | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Yuque | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Yueque |
轻微
[术语]
"Yueque"
理由: The translation should match the reference, which is 'Yuque'. | Incorrect romanization. '语雀' should be romanized as 'Yuque' not 'Yueque'. | The official English branding for '语雀' is 'Yuque'. 'Yueque' is an incorrect romanization of the brand name. | Incorrect official name of the product; standard form is “Yuque” as in the reference. |
| Qwen/Qwen3.5-4B | Yuque | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Yuque | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Raven |
严重
[准确性]
"Raven"
理由: Incorrect translation of proper noun; '语雀' is the brand name 'Yuque', not 'Raven'. | “语雀” is the product name ‘Yuque’; ‘Raven’ is an incorrect reinterpretation and loses the brand reference. | Completely incorrect translation. Should be 'Yuque' (proper noun/brand name) not 'Raven'. | Mistranslation - '语雀' (Yuque) is a brand name, not translated as 'Raven' |
| google/gemma-3-12b-it | Yiqiu |
严重
[准确性]
"Yiqiu"
理由: Incorrect transliteration. Should be 'Yuque' to match reference | Incorrect transliteration of brand name “语雀”, whose standard English form is “Yuque” as in the reference. | Incorrect transliteration - reference uses 'Yuque' (standard pinyin) while hypothesis uses 'Yiqiu' which is not the correct pronunciation | The hypothesis uses an incorrect Pinyin romanization ('Yiqiu') for the proper noun '语雀', which is officially branded as 'Yuque'. |
| google/gemma-3-1b-it | 雀语 |
致命
[准确性]
"雀语"
理由: Character order is reversed; this becomes a different term and fails to represent the proper noun '语雀' (Yuque). | Source is '语雀' (Yuque, a proper noun). Hypothesis reverses the characters to '雀语', which changes the meaning and is incorrect. Also, if the target language is Chinese, it should remain '语雀' or be translated if the task was Zh->En (but reference is 'Yuque', implying transliteration or keeping the name). Reversing characters is a critical error. | Hypothesis '雀语' is a character reversal of the source '语雀' and is incorrect. Should be 'Yuque' as in reference, which is a proper noun/brand name. | Hypothesis translates '语雀' (Yuque, a product name) to '雀语' which is incorrect; should be 'Yuque' as in reference |
| google/gemma-3-4b-it | Exit |
致命
[准确性]
"Exit"
理由: '语雀' is a proper noun (Yuque, a documentation platform), not the command 'Exit'. | The source '语雀' is a proper noun (a knowledge management tool) and should be transliterated or kept as 'Yuque', not translated as 'Exit'. | Hypothesis 'Exit' does not match source '语雀' (Yuque). Complete mistranslation. | Incorrect translation of '语雀'. | Mistranslation: '语雀' (Yuque - a Chinese product name) translated as 'Exit' - completely wrong meaning |
| google/translategemma-12b-it | Yiqiu |
严重
[准确性]
"Yiqiu"
理由: Incorrect transliteration of the proper noun '语雀'. The standard official English name is 'Yuque'. | Incorrect transliteration. Should be 'Yuque' as in reference, not 'Yiqiu'. | Incorrect transliteration - 'Yiqiu' is wrong, should be 'Yuque' (pinyin of 语雀) |
| google/translategemma-4b-it | Yaqq |
严重
[准确性]
"Yaqq"
理由: Incorrect transliteration. Source '语雀' should be transliterated as 'Yuque', not 'Yaqq' | “语雀” is the product name “Yuque”; “Yaqq” is an incorrect transliteration/spelling. | Incorrect transliteration - '语雀' should be 'Yuque' not 'Yaqq' | Incorrect translation of '语雀' to 'Yaqq' instead of 'Yuque' | Incorrect proper noun translation; the official English name for '语雀' is 'Yuque', not 'Yaqq'. |
| tencent/HY-MT1.5-1.8B | 语雀的翻译是 |
致命
[准确性]
"语雀的翻译是"
理由: Hypothesis is a meta-description in Chinese rather than an actual translation. Should provide 'Yuque' or equivalent English translation. | Meta-statement 'the translation of Yuque is' without providing the translation; meaning 'Yuque' is not conveyed. | The output is a sentence fragment meaning 'The translation of Yuque is', which is meta-commentary rather than the required translation of the proper noun 'Yuque'. | Hypothesis is a placeholder template '语雀的翻译是' (translation of 语雀 is) rather than actual translation; reference provides correct translation 'Yuque' | Partial translation provided instead of the actual term. |
| tencent/HY-MT1.5-7B | Yuque | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Self-study | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Self-study | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Self-study | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Self-study | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Self-study | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Self-study | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Self-study | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Self-study | 通过 (无共识错误) |
| google/gemma-3-12b-it | Self-study | 通过 (无共识错误) |
| google/gemma-3-1b-it | Self-study | 通过 (无共识错误) |
| google/gemma-3-4b-it | Self-study | 通过 (无共识错误) |
| google/translategemma-12b-it | Self-study | 通过 (无共识错误) |
| google/translategemma-4b-it | Profile |
致命
[准确性]
"Profile"
理由: Complete mistranslation; source means 'Self-study' but hypothesis translates as 'Profile'. | Complete mistranslation - source means 'Self-study' but hypothesis translates to unrelated word 'Profile' | Incorrect translation of '自学' to 'Profile' instead of 'Self-study' | “自学” means ‘Self-study’; “Profile” is unrelated. | Completely incorrect translation. Source '自学' means 'Self-study', not 'Profile' |
| tencent/HY-MT1.5-1.8B | Self-study | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Self-study | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Learning | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Learning | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Study |
轻微
[准确性]
"Study"
理由: Source “学习” in this context aligns better with “Learning” than “Study”; meaning is close but not fully matching the intended nuance. | Translation does not match reference 'Learning' - different word with different meaning | The source '学习' can mean both 'study' and 'learning'. While 'Study' is a valid translation, the reference 'Learning' suggests a context of the process or outcome rather than the act of studying. Without context, this is a minor nuance issue, but potentially a mismatch in intended meaning (noun vs verb implication or process vs action). | Hypothesis 'Study' differs from reference 'Learning'. Both are valid translations of '学习', but reference is preferred. |
| Qwen/Qwen2.5-7B-Instruct | study |
严重
[准确性]
"study"
理由: Translation does not match reference; reference is 'Learning' but hypothesis is 'study' | Hypothesis uses 'study' while reference uses 'Learning'. Both are valid translations of '学习', but 'Learning' is more commonly used as a noun in English contexts. | Source text '学习' is Chinese, but hypothesis 'study' is English. The translation is in the wrong direction (target should be Chinese). |
| Qwen/Qwen3-14B | Study |
轻微
[准确性]
"Study"
理由: Hypothesis uses 'Study' (verb) while reference uses 'Learning' (noun). Both are valid translations of '学习' but differ in form and connotation. | The source '学习' is a gerund/noun contextually better translated as 'Learning' (matching the reference) rather than the verb/base form 'Study'. | The translation uses a verb form instead of a gerund, which may not fully align with the reference. | The source "学习" is better rendered as the noun "Learning" here; "Study" suggests a verb or a specific act rather than the general concept. | Translation is a literal translation rather than the reference 'Learning' |
| Qwen/Qwen3.5-4B | Learn |
轻微
[准确性]
"Learn"
理由: Source '学习' is a noun or gerund concept; 'Learn' is a verb. Reference 'Learning' captures the nominal nature better. | The source “学习” is better rendered as the noun “Learning” in this UI-like context; “Learn” is a verb and mismatches the intended part of speech. | Translation differs from reference 'Learning' - different tense/form | Hypothesis uses verb form 'Learn' while reference uses gerund 'Learning'. For the Chinese noun '学习', the gerund form is more appropriate in English. |
| Qwen/Qwen3.5-9B | Learn |
轻微
[准确性]
"Learn"
理由: Source '学习' is a noun meaning 'learning', but hypothesis 'Learn' is a verb form. Should be 'Learning' to match reference. | The source '学习' is a noun or gerund in this context, matching the reference 'Learning'. The hypothesis 'Learn' is a verb, creating a part-of-speech mismatch. | The source and reference use the nominal form (“学习” / “Learning”), while the hypothesis uses the verb form “Learn”, causing a slight mismatch in part of speech. | Translation is a verb form while reference is a gerund; slight mismatch in part of speech |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Learning | 通过 (无共识错误) |
| google/gemma-3-12b-it | Learn |
轻微
[准确性]
"Learn"
理由: Word form mismatch: 'Learn' vs 'Learning' - verb vs gerund form | Source is a noun (学习), hypothesis is a verb form. Should be 'Learning' to match reference. | The source '学习' is a noun or gerund in this context, matching the reference 'Learning'. The hypothesis 'Learn' is a verb form, creating a part-of-speech mismatch. | The source is a gerund-like noun (“学习” as a concept), better rendered as “Learning” rather than the imperative/verb form “Learn”. |
| google/gemma-3-1b-it | Learning | 通过 (无共识错误) |
| google/gemma-3-4b-it | Submit |
致命
[准确性]
"Submit"
理由: Translation is completely incorrect. '学习' means 'Learning' not 'Submit' | The source text '学习' means 'Learning', but the hypothesis translates it as 'Submit', which is completely unrelated. | 完全误译,“学习”应译为“Learning/Study”,当前译文是界面操作命令,含义不同。 | The translation does not match the source meaning. | Complete mistranslation - source means 'Learning' but hypothesis is 'Submit' |
| google/translategemma-12b-it | Learn |
轻微
[准确性]
"Learn"
理由: Word form mismatch: hypothesis uses verb 'Learn' while reference uses noun 'Learning' | Source '学习' is a noun meaning 'learning', but hypothesis 'Learn' is a verb form. Should be 'Learning' to match reference. | The source '学习' is a noun or gerund in this context, matching the reference 'Learning'. The hypothesis 'Learn' is a verb form, creating a part-of-speech mismatch. | Source "学习" is better rendered as the gerund/noun "Learning" rather than the imperative/verb "Learn" in this context, as shown by the reference. |
| google/translategemma-4b-it | Exit |
致命
[准确性]
"Exit"
理由: Hypothesis 'Exit' does not match source '学习' (Learning) or reference 'Learning'. Completely incorrect translation. | The translation is incorrect. 'Exit' does not match the source '学习' or the reference 'Learning'. | The source text '学习' means 'Learning', but the hypothesis translates it as 'Exit', which is completely unrelated. | Completely wrong translation - 'Exit' does not mean '学习' (Learning) | Mistranslation: source means 'study/learning' but hypothesis is 'Exit', changing the meaning completely. |
| tencent/HY-MT1.5-1.8B | Learning |
致命
[准确性]
"Learning"
理由: Source is in Chinese (学习) but hypothesis is in English. Target language is zh, so output should be in Chinese or a proper translation format, not English. | Target language is specified as Chinese (zh) but the output is in English; it should be a Chinese rendering of “学习” or left as-is in Chinese, not translated into English. | The hypothesis is in English, but the target language is Chinese. The source '学习' should be translated to Chinese, not kept or converted to English. |
| tencent/HY-MT1.5-7B | Learning | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | skill | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Skill | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | skill |
轻微
[风格]
"skill"
理由: Capitalization difference from reference 'Skill' | Hypothesis uses lowercase 'skill' while reference uses capitalized 'Skill'. Capitalization mismatch. | The hypothesis uses lowercase 'skill' while the reference uses title case 'Skill'. In UI contexts or standalone labels, capitalization is often required for consistency. |
| Qwen/Qwen2.5-7B-Instruct | skill |
轻微
[风格]
"skill"
理由: Hypothesis uses lowercase 'skill' while reference uses capitalized 'Skill'. Capitalization inconsistency. | Lowercase instead of uppercase | Capitalization difference; reference is 'Skill' |
| Qwen/Qwen3-14B | skill |
轻微
[风格]
"skill"
理由: Hypothesis uses lowercase 'skill' while reference uses capitalized 'Skill'. Capitalization inconsistency. | Inconsistent capitalization compared to the reference 'Skill'; UI labels typically use Title Case. | Capitalization mismatch with reference 'Skill' |
| Qwen/Qwen3.5-4B | skill |
轻微
[风格]
"skill"
理由: Inconsistent capitalization compared to reference 'Skill' for a standalone label. | Capitalization difference from reference 'Skill' | Hypothesis uses lowercase 'skill' while reference uses capitalized 'Skill'. Capitalization is inconsistent with reference. |
| Qwen/Qwen3.5-9B | Skill | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | skill | 通过 (无共识错误) |
| google/gemma-3-12b-it | skill | 通过 (无共识错误) |
| google/gemma-3-1b-it | skill |
轻微
[风格]
"skill"
理由: Capitalization mismatch: hypothesis uses lowercase 'skill' while reference uses capitalized 'Skill' | Capitalization difference: 'skill' vs 'Skill' | Capitalization differs from reference. |
| google/gemma-3-4b-it | Invalid input |
致命
[准确性]
"Invalid input"
理由: Translation is completely incorrect. 'skill' should be translated as 'Skill' not 'Invalid input' | The source text 'skill' is translated as 'Invalid input', which is a complete hallucination and loss of meaning. | The translation does not match the source meaning. | Complete mistranslation - source means 'Skill' but hypothesis is 'Invalid input' | “skill”应译为“Skill/技能”,却被翻成“无效输入”,语义完全不符。 |
| google/translategemma-12b-it | skill | 通过 (无共识错误) |
| google/translategemma-4b-it | Next |
致命
[准确性]
"Next"
理由: Hypothesis 'Next' does not match source 'skill' or reference 'Skill'. Completely incorrect translation. | The translation is incorrect. 'Next' does not match the source 'skill' or the reference 'Skill'. | Completely wrong translation - 'Next' does not mean 'skill' (Skill) | The source text 'skill' is translated as 'Next', which is a complete mistranslation of the meaning. | Mistranslation: source 'skill' is rendered as 'Next', which is unrelated in meaning. |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Hypothesis '登录' (login) does not match source 'skill' or reference 'Skill'. Completely incorrect translation. | “skill” should be translated as “技能”等对应用语;“登录” means “log in” and is semantically incorrect. | Mistranslation: '登录' (login) does not match source 'skill' or reference 'Skill' | Mistranslation: 'skill' was translated as '登录' (Login) instead of '技能'. Additionally, the output is in Chinese while the reference suggests an English target, indicating a potential directionality error or severe hallucination. | Incorrect translation of 'skill' as '登录' (login) |
| tencent/HY-MT1.5-7B | 助手 |
严重
[准确性]
"助手"
理由: Hypothesis translates to 'assistant' but source is 'skill' and reference is 'Skill'. Complete mistranslation. | Incorrect translation of 'skill' as '助手' (assistant) instead of 'skill'. | Mistranslation: 'skill' translated to 'assistant' instead of 'Skill' | Mistranslation; “skill” should be translated as “技能” or similar, not “助手” (assistant). | The source 'skill' is translated as 'assistant' (助手), which is a complete mistranslation of the meaning. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Translation of '翻译' |
致命
[准确性]
"Translation of '翻译'"
理由: Hypothesis includes meta-text 'Translation of' which should not be in the output; should be just 'Translation' | The hypothesis contains meta-text explaining the translation rather than providing the actual translation in the target language (Chinese). It also fails to output Chinese. | Over-translation: added explanatory text that is not in reference |
| CohereLabs/tiny-aya-water | Translation | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Translation | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Translation | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Translate |
轻微
[准确性]
"Translate"
理由: Hypothesis uses 'Translate' (verb) while reference uses 'Translation' (noun). The source '翻译' is more naturally translated as the noun 'Translation'. | Verb form used instead of noun 'Translation' | The translation uses a verb form instead of a noun, which may not fully align with the reference. | The source '翻译' as a noun concept is better rendered as 'Translation' (matching reference) rather than the verb 'Translate'. | "翻译" in isolation is nominal in this UI-like context and is better translated as "Translation" rather than the verb "Translate". |
| Qwen/Qwen3.5-4B | Translate |
严重
[准确性]
"Translate"
理由: “翻译” in isolation typically refers to the noun “translation” in this context; “Translate” is a verb and changes the grammatical function and likely UI meaning. | The translation uses a verb form, while the reference uses a noun form, which changes the part of speech and meaning. | Source '翻译' functions as a noun here; 'Translate' is a verb. Reference 'Translation' is the correct nominal form. | Translation differs from reference 'Translation' - verb vs noun | Hypothesis uses verb form 'Translate' while reference uses noun form 'Translation'. For the Chinese noun '翻译', the noun form is more appropriate. |
| Qwen/Qwen3.5-9B | Translate |
轻微
[准确性]
"Translate"
理由: The Chinese “翻译” and the reference are in noun form (“Translation”), but the hypothesis uses the verb “Translate”, which changes the part of speech. | Verb form used instead of noun form in reference | The source '翻译' functions as a noun here, aligned with the reference 'Translation'. The hypothesis 'Translate' is a verb, resulting in a grammatical category error. | Source '翻译' is a noun meaning 'translation', but hypothesis 'Translate' is a verb form. Should be 'Translation' to match reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Translation | 通过 (无共识错误) |
| google/gemma-3-12b-it | Translate |
轻微
[准确性]
"Translate"
理由: The source '翻译' functions as a noun (concept) here, aligned with the reference 'Translation'. The hypothesis 'Translate' is a verb, causing a grammatical category error. | Source is a noun (翻译), hypothesis is a verb form. Should be 'Translation' to match reference. | Word form mismatch: 'Translate' (verb) vs 'Translation' (noun) | “翻译” used as a noun should be “Translation”; “Translate” is a verb form and changes the part of speech. |
| google/gemma-3-1b-it | Translation of '翻译' in en |
严重
[流畅性]
"Translation of '翻译' in en"
理由: Contains meta-text ('Translation of ... in en') instead of a clean translation, which is ungrammatical and not suitable as UI text. | Unnecessary addition of 'in en'. | Unnecessary meta-commentary in translation |
| google/gemma-3-4b-it | Translate |
严重
[准确性]
"Translate"
理由: Incorrect word form. '翻译' as a noun means 'Translation' not 'Translate' (verb) | 源文为名词“翻译”,参考为“Translation”,而假译为动词“Translate”,词性和含义不符。 | Translation is a verb form rather than the noun 'Translation' in reference | The source '翻译' can mean 'Translation' (noun) or 'Translate' (verb). Given the reference 'Translation', the noun form is likely intended for a UI label, making the verb form 'Translate' a minor accuracy/style issue, though often acceptable. However, strictly following the reference context implies a noun. |
| google/translategemma-12b-it | Translate |
轻微
[准确性]
"Translate"
理由: Word form mismatch: hypothesis uses verb 'Translate' while reference uses noun 'Translation' | The source '翻译' functions as a noun (concept) here, aligned with reference 'Translation'. The hypothesis 'Translate' is a verb, causing a grammatical category error. | "翻译" here is used as a noun; "Translation" matches the reference better than the verb form "Translate". | Source '翻译' is a noun meaning 'translation', but hypothesis 'Translate' is a verb form. Should be 'Translation' to match reference. |
| google/translategemma-4b-it | Message |
致命
[准确性]
"Message"
理由: Completely wrong translation - 'Message' does not mean '翻译' (Translation) | The translation is incorrect. 'Message' does not match the source '翻译' or the reference 'Translation'. | Mistranslation: source '翻译' means 'translation', not 'message'. | The source text '翻译' means 'Translation', but the hypothesis uses 'Message', which is semantically incorrect. | Hypothesis 'Message' does not match source '翻译' (Translation) or reference 'Translation'. Completely incorrect translation. |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: Source is “翻译” (translation); hypothesis introduces wrong content ('登录') and a metalinguistic template instead of a direct translation, changing the meaning entirely. | Incorrect translation of '翻译' as 'Translation of '登录' in en' | Mistranslation: hypothesis contains meta-text rather than actual translation; source is '翻译', reference is 'Translation' | Hypothesis is a meta-description rather than actual translation. Should be 'Translation' not a description of translation process. | The hypothesis is a meta-commentary explaining a translation task rather than providing the actual translation. It also references the wrong source word ('登录' instead of '翻译'). |
| tencent/HY-MT1.5-7B | Translation | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | C# (pronounced /sɛs/ or /sə/) is a modern general-purpose programming language developed by Microsoft for its .NET Framework and also used as a language to develop applications for the Apple iOS platform. |
致命
[准确性]
"C# (pronounced /sɛs/ or /sə/) is a modern general-purpose programming language developed by Microsoft for its .NET Framework and also used as a language to develop applications for the Apple iOS platform."
理由: Major mistranslation: full sentence provided instead of the short term 'C#' | Incorrect expansion of 'CSharp' | Adds extensive information and claims (including pronunciation and iOS usage) not present in the source, going far beyond its meaning. | Hypothesis provides extensive explanation instead of simple translation 'C#'; completely over-expanded content | The hypothesis provides an unsolicited encyclopedic definition instead of translating the source term. Additionally, it is in English, not the required target language Chinese. |
| CohereLabs/tiny-aya-water | C# (pronounced /sɛs/ or /siːs/) is a modern, strongly typed programming language developed by Microsoft for its .NET Framework. |
严重
[准确性]
"C# (pronounced /sɛs/ or /siːs/) is a modern, strongly typed programming language developed by Microsoft for its .NET Framework."
理由: Over-translation: hypothesis contains full explanation while reference is just 'C#' | Hypothesis provides extensive explanation instead of simple translation. Source is 'CSharp' which should translate to 'C#', not a full definition. | The hypothesis provides an explanation instead of the correct term 'C#'. | Adds a long descriptive sentence that is not present in the source, going beyond the intended short label. | The hypothesis adds extensive information not present in the source ('CSharp') and is in English instead of the required target language Chinese (zh). |
| Qwen/Qwen2.5-14B-Instruct | CSharp |
轻微
[术语]
"CSharp"
理由: The translation uses 'CSharp' instead of 'C#', which is a minor formatting difference. | The standard branding and terminology for the language is 'C#', not 'CSharp'. While 'CSharp' is understood, it deviates from the official trademarked name. | Standard term is “C#”; “CSharp” is a nonstandard variant and may cause confusion in technical contexts. | Should be 'C#' as per reference - incorrect technical term |
| Qwen/Qwen2.5-7B-Instruct | CSharp |
严重
[术语]
"CSharp"
理由: Non-standard spelling of the product/technology name; should match 'C#' per reference. | Incorrect term; should be 'C#' | Incorrect terminology; reference is 'C#' but hypothesis is 'CSharp' |
| Qwen/Qwen3-14B | C# | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | CSharp |
严重
[术语]
"CSharp"
理由: Incorrect terminology - should be 'C#' per reference | Standard branding for the language is 'C#', not 'CSharp'. | Hypothesis uses 'CSharp' while reference uses standard terminology 'C#'. The correct notation for the programming language is C#. | The translation should use 'C#' as per the reference, which is the correct and widely accepted notation. | Standard term is the symbol form “C#”; “CSharp” is a nonstandard spelling even if understandable. |
| Qwen/Qwen3.5-9B | C# | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | C# | 通过 (无共识错误) |
| google/gemma-3-12b-it | CSharp |
严重
[术语]
"CSharp"
理由: Incorrect terminology: 'CSharp' should be 'C#' (official name) | Incorrect terminology. Should be 'C#' as per reference and standard convention. | Standard term is “C#”; “CSharp” deviates from the conventional product/technology naming. | The hypothesis uses 'CSharp' instead of the correct 'C#'. | The standard branding and terminology for the language is 'C#', not 'CSharp'. The hypothesis fails to use the official symbol. |
| google/gemma-3-1b-it | C# | 通过 (无共识错误) |
| google/gemma-3-4b-it | CSharp |
轻微
[术语]
"CSharp"
理由: Incorrect terminology. 'CSharp' should be formatted as 'C#' which is the standard notation | 未使用标准写法“C#”,尽管可理解,但与参考术语约定不一致。 | Should use 'C#' (with hash) not 'CSharp' as per reference and standard terminology | While 'CSharp' is understandable, the standard branding and reference use 'C#'. | The translation uses 'CSharp' instead of the more common 'C#'. |
| google/translategemma-12b-it | CSharp |
严重
[术语]
"CSharp"
理由: Hypothesis 'CSharp' does not match reference 'C#'. The correct notation for the programming language is 'C#'. | Incorrect terminology: hypothesis uses 'CSharp' while reference uses correct 'C#' | The hypothesis uses 'CSharp' instead of the correct 'C#'. | Standard term is "C#"; "CSharp" is a nonstandard spelling and does not match the reference terminology. | Non-standard terminology. The official and preferred name for the language is 'C#', as shown in the reference. |
| google/translategemma-4b-it | Delete |
致命
[准确性]
"Delete"
理由: The source text 'CSharp' is a programming language name, but the hypothesis translates it as 'Delete', which is unrelated. | Wrong translation - 'Delete' does not mean 'CSharp' (C#) | The translation is incorrect. 'Delete' does not match the source 'CSharp' or the reference 'C#'. | Hypothesis 'Delete' does not match source 'CSharp' or reference 'C#'. Completely incorrect translation. | Mistranslation: 'CSharp' (C#) is a programming language name, not the command 'Delete'. |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Source is “CSharp” (programming language C#); “登录” means “log in” and is unrelated to the source term. | Incorrect translation of 'CSharp' as '登录' (login) | Mistranslation: '登录' (login) does not match source 'CSharp' or reference 'C#' | Hypothesis '登录' (login) does not match source 'CSharp' or reference 'C#'. Completely incorrect translation. | Hallucination/Mistranslation: 'CSharp' was translated as '登录' (Login), which is completely unrelated. Proper nouns/technical terms should be preserved or transliterated. |
| tencent/HY-MT1.5-7B | CSharp |
轻微
[术语]
"CSharp"
理由: Hypothesis uses 'CSharp' but reference uses 'C#' which is the standard notation for the programming language. | The standard branding for the programming language is 'C#', not 'CSharp'. | Use of 'CSharp' instead of the standard 'C#'. | Nonstandard representation; the common form is “C#” as in the reference, though meaning is clear. | Inconsistent terminology: 'CSharp' vs reference 'C#' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | About Us |
致命
[准确性]
"About Us"
理由: Completely incorrect translation of '打卡' | Hypothesis 'About Us' is completely incorrect translation of '打卡' (check-in); unrelated content | Mistranslation; source means 'Check-in' while hypothesis means 'About Us', changing the meaning completely. | The translation is completely unrelated to the source meaning ('打卡' means check-in/clock in). It also fails to translate into Chinese. | Completely wrong translation: 'About Us' instead of 'Check-in' |
| CohereLabs/tiny-aya-water | Check-in | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Clock In |
轻微
[术语]
"Clock In"
理由: The translation uses 'Clock In' instead of 'Check-in', which is a slight variation but still accurate. | Term 'Clock In' differs from reference 'Check-in' - both are valid translations but reference prefers 'Check-in' | While 'Clock In' is a valid translation for '打卡', 'Check-in' is often preferred in modern app contexts for attendance or location verification. The difference is subtle and context-dependent, but 'Check-in' is the referenced standard here. |
| Qwen/Qwen2.5-7B-Instruct | Check-in | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Clock In |
轻微
[准确性]
"Clock In"
理由: "打卡" in app/website contexts usually refers to "Check-in" (e.g., attendance, sign-in) rather than specifically employment-related "Clock In". | Hypothesis 'Clock In' differs from reference 'Check-in'. Both are valid but 'Check-in' is the more standard translation for '打卡' in this context. | The translation uses a different phrasing than the reference. |
| Qwen/Qwen3.5-4B | Clock In | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Clock in | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Check-in | 通过 (无共识错误) |
| google/gemma-3-12b-it | Clock In | 通过 (无共识错误) |
| google/gemma-3-1b-it | 打卡 |
严重
[准确性]
"打卡"
理由: The term is left untranslated Chinese while an English translation ('Check-in') is required for the target. | Does not translate to 'Check-in'. | The hypothesis is identical to the source (Chinese), but the target language is Chinese. While the characters are correct, the task implies translation from Chinese to Chinese which is redundant, or more likely, the model failed to process the direction correctly if the source was meant to be translated. However, strictly evaluating Source (zh) -> Target (zh), this is technically correct content-wise but suspicious given the prompt context. Wait, re-reading the prompt: Target Language is zh. Source is zh. Hypothesis is zh. This is an identity translation. Is it an error? Usually, these datasets test zh->en. Let's look at Item 1: Source '学习' (zh), Hypothesis 'Learning' (en). This implies the task is actually zh->en, but the metadata says 'Target Language: zh'. This is a contradiction in the provided test data setup. Correction: If the metadata 'Target Language: zh' is strict, then Item 1 (Learning) is wrong. If the actual task implied by the pairs (Source zh -> Hypothesis en) is zh->en, then the metadata is wrong. Let's assume the metadata 'Target Language: zh' is the ground truth constraint for evaluation. Item 1: Source '学习' (zh). Target 'zh'. Hypothesis 'Learning' (en). Error: Wrong language. Item 5: Source '打卡' (zh). Target 'zh'. Hypothesis '打卡' (zh). This is correct under the constraint. Item 11: Source '抖音' (zh). Target 'zh'. Hypothesis '抖音' (zh). Correct. Item 13: Source '西西弗斯' (zh). Target 'zh'. Hypothesis 'Westfides' (en). Error: Wrong language. Wait, looking at the Reference column: Item 1 Ref: 'Learning' (en). Item 5 Ref: 'Check-in' (en). Item 11 Ref: 'Douyin / TikTok' (en). Conclusion: The metadata 'Target Language: zh' is INCORRECT in the prompt description for these items. The actual task is clearly Chinese (zh) to English (en). The references are all in English. I must evaluate based on the implicit task (zh->en) because evaluating against the explicit metadata (zh->zh) would make all references wrong too. Re-evaluation strategy: Treat the task as Chinese Source -> English Target. Item 1: Source '学习', Hypothesis 'Learning', Ref 'Learning'. Correct. Item 2: Source 'skill' (Wait, source is 'skill' which is English? No, source says 'skill'. If source is English and target is English, it's identity. But Ref is 'Skill'. Maybe source is meant to be Chinese '技能'? The text provided in |
| google/gemma-3-4b-it | Delete |
致命
[准确性]
"Delete"
理由: The source '打卡' means 'Check-in', but the hypothesis translates it as 'Delete', which is the opposite action in many contexts and semantically incorrect. | “打卡”在此为“Check-in/签到”,与“Delete(删除)”完全不符。 | Translation is completely incorrect. '打卡' means 'Check-in' not 'Delete' | The translation does not match the source meaning. | Complete mistranslation - source means 'Check-in' but hypothesis is 'Delete' |
| google/translategemma-12b-it | Clock in | 通过 (无共识错误) |
| google/translategemma-4b-it | Clock In | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Mistranslation: '打卡' (Check-in/Clock in) was incorrectly translated as '登录' (Login). | “打卡” in this context corresponds to “check-in” or “clock in”; “登录” (log in) is a different action and thus inaccurate. | Mistranslation: '登录' (login) does not match source '打卡' or reference 'Check-in' | Hypothesis '登录' (login) does not match source '打卡' (check-in) or reference 'Check-in'. Completely incorrect translation. | Incorrect translation of '打卡' as '登录' (login) |
| tencent/HY-MT1.5-7B | Card punching |
严重
[准确性]
"Card punching"
理由: Incorrect translation of '打卡' as 'Card punching' instead of 'Check-in'. | 'Card punching' is a literal and awkward translation of '打卡'. In modern contexts (apps, attendance), 'Check-in' is the accurate equivalent. | Hypothesis 'Card punching' does not match reference 'Check-in'. The term '打卡' means check-in in modern context, not card punching. | Awkward and misleading rendering of “打卡”; “Check-in” or similar would be appropriate. Current translation suggests physical card punching rather than check-in. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Provide technical support, enable JavaScript for best experience |
轻微
[流畅性]
"Provide technical support, enable JavaScript for best experience"
理由: Minor wording difference: 'for best experience' vs 'for the best experience' | Missing final period and slightly less natural than the reference; meaning is preserved. | Less fluent than the reference |
| CohereLabs/tiny-aya-water | Provide technical support, enable JavaScript for best experience |
轻微
[流畅性]
"for best experience"
理由: Missing article 'the' before 'best experience' | Missing period at the end. | Missing article "the" before "best experience"; should be "for the best experience" to match natural English. |
| Qwen/Qwen2.5-14B-Instruct | Provide technical support, enable JavaScript for the best experience | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Provide technical support, enable JavaScript for the best experience | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Get technical support, enable JavaScript for the best experience |
轻微
[准确性]
"Get technical support"
理由: Word order differs from reference 'Tech support provided' | Source states that technical support is provided, not an imperative to get it; closer would be "Provide technical support" or "Technical support provided". | Hypothesis translates '提供技术支持' as 'Get technical support' but reference translates it as 'Tech support provided'. The source means 'provide' not 'get', indicating a semantic error. | The source implies a statement of availability ('Technical support provided'), whereas the hypothesis translates it as an imperative command ('Get technical support'). |
| Qwen/Qwen3.5-4B | Provide technical support, enable JavaScript for the best experience | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Provide technical support, enable JavaScript for the best experience | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Provide technical support, enable JavaScript for the best experience. |
严重
[准确性]
"Provide technical support,"
理由: Translates the original as an imperative instead of describing that tech support is provided; slight change in sentence mood/meaning. | Meaning shift: hypothesis translates as 'Provide technical support' but reference indicates 'Tech support provided' (passive voice, different meaning) | Source is Chinese, target language is Chinese. Hypothesis is English. Complete failure to translate. |
| google/gemma-3-12b-it | Provide technical support, enable JavaScript for the best experience | 通过 (无共识错误) |
| google/gemma-3-1b-it | Provide technical support and enable JavaScript to get the best experience. |
严重
[准确性]
"Provide technical support and"
理由: Hypothesis translates '提供技术支持' as 'Provide technical support' but reference uses 'Tech support provided' with different structure and meaning | The source text '提供技术支持' in this context (paired with enabling JavaScript) is an imperative instruction to the user to 'Provide [your own] technical support' (i.e., ensure support is available/active) or more likely a mistranslation of a status message. However, given the reference 'Tech support provided', the source is likely a fragment meaning 'Technical support [is] provided'. The hypothesis interprets it as a command to the user to 'Provide technical support', which changes the meaning entirely. Alternatively, if the source is an instruction to the user to 'Enable JS to get support', the phrasing is awkward. The most critical error is interpreting a status or condition as a user command to 'Provide' support. | Slight change in information structure compared with source/reference; implies providing support rather than stating it is provided, but overall meaning is mostly preserved. |
| google/gemma-3-4b-it | OK |
致命
[准确性]
"OK"
理由: The hypothesis 'OK' completely omits the detailed instruction and information present in the source text regarding tech support and JavaScript. | 长句“提供技术支持,启用 JavaScript 以获得最佳体验”被压缩为“OK”,绝大部分信息丢失。 | Translation is completely incorrect. The source is a full sentence about tech support and JavaScript, not a simple 'OK' | Complete mistranslation - full sentence translated to single word 'OK' | The translation does not match the source meaning. |
| google/translategemma-12b-it | Provides technical support; enable JavaScript for the best experience. |
轻微
[流畅性]
"Provides technical support"
理由: The sentence fragment would be more natural as an imperative ("Provide technical support" or "Tech support provided") to match the reference style; current phrasing is slightly awkward in isolation. | Awkward phrasing and punctuation. The source implies a status or feature description ('Tech support provided'), whereas 'Provides technical support' sounds like an active verb phrase describing an agent. The semicolon usage is also less natural than a period or separate sentence structure. | The hypothesis uses a semicolon instead of a period as in the reference. |
| google/translategemma-4b-it | Provide technical support, enable JavaScript for optimal experience | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Provide technical support and enable JavaScript for an optimal experience |
严重
[准确性]
"Provide technical support and enable JavaScript for an optimal experience"
理由: Wrong target language: The hypothesis is in English, but the target language is Chinese. The source text should have been translated into Chinese. | Overall meaning is preserved, but the Chinese source more literally says that technical support is provided and instructs to enable JavaScript; the English rephrasing is slightly more directive and blends clauses, though impact is small. | Hypothesis is in English but source is in Chinese and target language is zh. Should provide Chinese translation or proper format. |
| tencent/HY-MT1.5-7B | Provide technical support. Enable JavaScript for an optimal experience. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Translate this sentence. |
致命
[准确性]
"Translate this sentence."
理由: Hypothesis is meta-instruction text, not a translation; should be 'Badges' | The hypothesis is an instruction rather than a translation. It is also in English instead of Chinese. | Completely incorrect; system-message style output unrelated to the source term '徽章' (Badges). | Completely wrong translation: placeholder text instead of 'Badges' | Completely incorrect translation of '徽章' |
| CohereLabs/tiny-aya-water | Medal |
严重
[准确性]
"Medal"
理由: The correct term should be 'Badges'. | The hypothesis is in English, but the target language is specified as Chinese (zh). Additionally, 'Medal' may be a mistranslation of '徽章' which often refers to 'Badge' in digital contexts, but the primary error is the wrong language. | "徽章" in this UI context refers to badges, not medals; incorrect term chosen. |
| Qwen/Qwen2.5-14B-Instruct | Badge |
轻微
[准确性]
"Badge"
理由: The source '徽章' is generic and can be singular or plural. The reference 'Badges' implies a plural context (common in UI lists). 'Badge' (singular) might be inaccurate if the UI section displays multiple badges. | Hypothesis uses singular 'Badge' while reference uses plural 'Badges'. Number mismatch. | Source “徽章” is plural or generic; UI context usually expects “Badges” like the reference, while singular is still understandable but less accurate. | Singular form differs from reference plural 'Badges' |
| Qwen/Qwen2.5-7B-Instruct | Badge |
轻微
[准确性]
"Badge"
理由: Singular form where reference uses plural 'Badges'; potential mismatch with intended UI label if referring to multiple items. | Hypothesis uses singular 'Badge' while reference uses plural 'Badges'. The source '徽章' can be singular or plural, but reference indicates plural form is preferred. | Singular form; reference is 'Badges' (plural) | Source text '徽章' is Chinese, but hypothesis 'Badge' is English. The translation is in the wrong direction. |
| Qwen/Qwen3-14B | Badge |
轻微
[准确性]
"Badge"
理由: Source is plural-like in this context; reference indicates "Badges" as the intended meaning, not singular "Badge". | Hypothesis uses singular 'Badge' while reference uses plural 'Badges'. The source '徽章' can be singular or plural, but reference suggests plural form is preferred. | The source '徽章' often implies a category or collection in UI contexts, making the plural 'Badges' (reference) more accurate than the singular 'Badge'. | The translation uses a singular form while the reference uses a plural form. | Singular form instead of plural 'Badges' |
| Qwen/Qwen3.5-4B | Badge |
轻微
[准确性]
"Badge"
理由: Hypothesis uses singular 'Badge' while reference uses plural 'Badges'. The Chinese '徽章' can be plural, and reference uses plural form. | Singular form differs from reference plural 'Badges' | Source '徽章' often implies a collection or category in UI contexts; reference uses plural 'Badges'. | Source “徽章” matches the plural “Badges” in the reference, likely a UI label for a section; singular “Badge” may not match intended usage. |
| Qwen/Qwen3.5-9B | Badge |
轻微
[准确性]
"Badge"
理由: The source can be singular or plural, and the reference chooses plural (“Badges”); the hypothesis picks singular, slightly diverging from the intended plurality. | Source '徽章' is singular, but reference uses plural 'Badges'. Number agreement issue. | Singular form used while reference is plural |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Badge |
轻微
[准确性]
"Badge"
理由: Missing plural form. | Source and reference imply plural badges; singular form may be less accurate in this UI context. | Number mismatch: hypothesis uses singular 'Badge' while reference uses plural 'Badges' | Source is Chinese ('徽章'), target language is Chinese. Hypothesis is English. Complete failure to translate. |
| google/gemma-3-12b-it | Badge |
轻微
[准确性]
"Badge"
理由: The source '徽章' is a general term that often implies a collection or category in UI contexts, matching the plural reference 'Badges'. The singular 'Badge' might be too specific if it refers to a section header. | Number mismatch: singular 'Badge' vs plural 'Badges' | Number mismatch. Source is plural (徽章), should be 'Badges' to match reference. | Source lacks explicit number, but UI context in reference chooses plural “Badges”; singular “Badge” may be slightly misaligned with intended plurality. |
| google/gemma-3-1b-it | Badge |
轻微
[准确性]
"Badge"
理由: The source '徽章' is a general term that can be singular or plural. In UI contexts (implied by the reference 'Badges'), it often refers to a category or list, making the plural 'Badges' more natural. 'Badge' is not strictly wrong but less idiomatic for a menu item representing the feature. | Number mismatch: singular 'Badge' vs plural '徽章', which should align with 'Badges' in this UI context. | Singular form 'Badge' used in hypothesis while reference uses plural 'Badges' |
| google/gemma-3-4b-it | Submit |
致命
[准确性]
"Submit"
理由: Translation is completely incorrect. '徽章' means 'Badges' not 'Submit' | The translation does not match the source meaning. | Complete mistranslation - source means 'Badges' but hypothesis is 'Submit' | “徽章”应译为“Badges”,而“Submit”是提交操作,与原义无关。 | The source '徽章' means 'Badges', but the hypothesis translates it as 'Submit', which is unrelated. |
| google/translategemma-12b-it | Badge |
轻微
[准确性]
"Badge"
理由: Source "徽章" can be plural; the reference uses "Badges" indicating a plural UI label. Singular "Badge" may misrepresent the intended meaning. | Hypothesis uses singular 'Badge' but reference uses plural 'Badges'. Source '徽章' can be plural. | Number mismatch: hypothesis uses singular 'Badge' while reference uses plural 'Badges' | Number mismatch. The reference 'Badges' implies plurality, which is common for UI sections listing multiple items. The singular 'Badge' might be inaccurate if the context implies a collection. |
| google/translategemma-4b-it | Exit |
致命
[准确性]
"Exit"
理由: The translation is incorrect. 'Exit' does not match the source '徽章' or the reference 'Badges'. | Mistranslation: '徽章' means 'badges', not 'exit'. | The source text '徽章' means 'Badges', but the hypothesis translates it as 'Exit', which is completely wrong. | Completely wrong translation - 'Exit' does not mean '徽章' (Badges) | Hypothesis 'Exit' does not match source '徽章' (Badges) or reference 'Badges'. Completely incorrect translation. |
| tencent/HY-MT1.5-1.8B | 徽章的翻译是 |
严重
[准确性]
"徽章的翻译是"
理由: Incorrect translation of '徽章' as '徽章的翻译是' (The translation of '徽章') | Hypothesis is a meta-description ('the translation of 徽章 is') rather than the actual translation. Should be 'Badges' not a description. | Mistranslation: hypothesis adds extra words '的翻译是' (the translation is) rather than providing the term itself; reference is 'Badges' | Source is just “徽章” (badges); hypothesis adds extra metatext '的翻译是' and omits the actual translation, so it does not actually translate the term. | The hypothesis is a sentence fragment meaning 'The translation of Badge is', rather than providing the actual translation '徽章' or the target term. It fails to translate the concept into the target context. |
| tencent/HY-MT1.5-7B | badge | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | SMS | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | SMS | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | sms |
轻微
[风格]
"sms"
理由: Should be capitalized as the standard acronym “SMS” to match conventional usage. | Lowercase differs from reference uppercase 'SMS' | Acronyms like SMS are typically capitalized. The hypothesis uses lowercase 'sms'. | Hypothesis uses lowercase 'sms' while reference uses uppercase 'SMS'. Capitalization mismatch. |
| Qwen/Qwen2.5-7B-Instruct | sms |
轻微
[风格]
"sms"
理由: Capitalization; reference is 'SMS' (all caps) | Casing does not match standard acronym capitalization 'SMS' as in reference. | Hypothesis uses lowercase 'sms' while reference uses uppercase 'SMS'. Acronyms should typically be capitalized. | Lowercase instead of uppercase |
| Qwen/Qwen3-14B | sms |
轻微
[风格]
"sms"
理由: Hypothesis uses lowercase 'sms' while reference uses uppercase 'SMS'. Capitalization inconsistency for acronym. | Acronyms should be capitalized ('SMS') to match standard convention and the reference. | Lowercase instead of uppercase 'SMS' | Should follow standard capitalization as in the reference, "SMS". |
| Qwen/Qwen3.5-4B | sms |
轻微
[风格]
"sms"
理由: Hypothesis uses lowercase 'sms' while reference uses uppercase 'SMS'. Standard convention is to use uppercase for the acronym SMS. | Acronym should be capitalized as 'SMS' per standard convention and reference. | Capitalization differs from reference 'SMS' |
| Qwen/Qwen3.5-9B | SMS | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Message |
严重
[准确性]
"Message"
理由: Incorrect translation: 'sms' should be translated as 'SMS' (acronym), not 'Message' | Incorrect translation of 'sms'. | Source is English ('sms'), target language is Chinese. Hypothesis is English. Failure to translate. Additionally, 'sms' is often retained as an acronym or translated specifically, but the language mismatch is the primary error. | Incorrect translation: 'Message' is not equivalent to 'SMS' |
| google/gemma-3-12b-it | SMS | 通过 (无共识错误) |
| google/gemma-3-1b-it | SMS | 通过 (无共识错误) |
| google/gemma-3-4b-it | Send |
致命
[准确性]
"Send"
理由: 源文是名词/缩写“sms”,参考为“SMS”,而假译为动词“Send”,概念不同。 | Translation is completely incorrect. 'sms' is an acronym that should remain as 'SMS' not be translated to 'Send' | The translation does not match the source meaning. | Mistranslation - 'sms' should remain as 'SMS' not converted to 'Send' | The source 'sms' refers to the protocol/message type, but the hypothesis translates it as the action 'Send'. |
| google/translategemma-12b-it | SMS | 通过 (无共识错误) |
| google/translategemma-4b-it | SMS | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Hypothesis '登录' (login) does not match source 'sms' or reference 'SMS'. Completely incorrect translation. | Incorrect translation of 'sms' as '登录' (login) | Mistranslation: '登录' (login) does not match source 'sms' or reference 'SMS' | “sms” should correspond to “短信/SMS”; “登录” (log in) is incorrect. | Hallucination: 'sms' was translated as '登录' (Login), which is unrelated. It should be '短信' or kept as 'SMS' depending on convention. |
| tencent/HY-MT1.5-7B | 短信 | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | UC Cloud Storage |
轻微
[术语]
"UC Cloud Storage"
理由: Hypothesis uses 'Storage' but reference uses 'Drive'; terminology mismatch for '网盘' | Terminology variation: 'Storage' vs 'Drive' - both acceptable but reference uses 'Drive' | Different term choice from reference ('Drive' vs 'Storage'); similar concept but not the preferred terminology. | Slightly different term than 'UC Cloud Drive' |
| CohereLabs/tiny-aya-water | UC Cloud Storage |
轻微
[术语]
"UC Cloud Storage"
理由: Hypothesis uses 'Storage' while reference is 'Drive' - slight terminology difference | The correct term should be 'UC Cloud Drive'. | Hypothesis uses 'Storage' but reference uses 'Drive'. Both are acceptable but 'Drive' is the more precise term for '网盘'. | "Cloud Drive" is the preferred term for this product; "Cloud Storage" is understandable but not the expected designation. |
| Qwen/Qwen2.5-14B-Instruct | UC Cloud Drive | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | UC Cloud Disk |
轻微
[术语]
"UC Cloud Disk"
理由: 'Cloud Disk' is a less common term than 'Cloud Drive' and does not match the established product naming in the reference. | Incorrect term; should be 'UC Cloud Drive' | Hypothesis uses 'Disk' while reference uses 'Drive'. Both are acceptable translations of '网盘', but 'Drive' is more commonly used in English for cloud storage services. | Terminology inconsistency; reference is 'UC Cloud Drive' |
| Qwen/Qwen3-14B | UC Cloud Drive | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | UC Cloud Drive | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | UC Cloud Drive | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | UC Drive |
轻微
[准确性]
"UC Drive"
理由: Incomplete translation: hypothesis is 'UC Drive' but reference is 'UC Cloud Drive' - missing 'Cloud' component | Omits 'Cloud' present in the reference 'UC Cloud Drive', reducing specificity of the service name. | Missing 'Cloud' in the translation. | Source is Chinese ('UC 网盘'), target language is Chinese. Hypothesis is English. Complete failure to translate. |
| google/gemma-3-12b-it | UC Cloud Disk |
轻微
[术语]
"UC Cloud Disk"
理由: Terminology: 'Disk' vs 'Drive' - different word choice for storage service | “Cloud Drive” is the more standard term for online storage; “Cloud Disk” is understandable but nonstandard terminology. | The official English name for 'UC 网盘' is 'UC Cloud Drive'. 'Disk' is a literal translation but not the established brand terminology. | Terminology variation. Should be 'UC Cloud Drive' not 'Disk' to match reference. | The hypothesis uses 'Disk' instead of 'Drive' as in the reference. |
| google/gemma-3-1b-it | UC Network Drive |
轻微
[术语]
"UC Network Drive"
理由: '网盘' specifically refers to 'Cloud Drive' or 'Cloud Storage'. 'Network Drive' typically refers to a local network share (NAS/SMB), which is a different technology. This is a significant terminology error. | Inconsistent terminology: 'Network Drive' vs 'Cloud Drive' | Should be 'UC Cloud Drive'. | Hypothesis uses 'Network Drive' while reference uses 'Cloud Drive'; incorrect terminology for UC网盘 | 'Network Drive' is understandable but diverges from the more standard/cloud-oriented term 'Cloud Drive' used in the reference. |
| google/gemma-3-4b-it | UC Disk |
轻微
[术语]
"UC Disk"
理由: The official English name for 'UC 网盘' is 'UC Cloud Drive'. 'UC Disk' is a non-standard literal translation. | Should be 'UC Cloud Drive' to match reference terminology | The translation uses 'Disk' instead of the more precise 'Cloud Drive'. | Incomplete translation. '网盘' should be 'Cloud Drive' not just 'Disk'. Reference is 'UC Cloud Drive' |
| google/translategemma-12b-it | UC Cloud Disk |
轻微
[术语]
"UC Cloud Disk"
理由: "UC Cloud Drive" is the preferred product term per the reference; "Disk" deviates from established terminology. | Hypothesis uses 'Disk' but reference uses 'Drive'. 'Drive' is the more standard terminology for cloud storage services. | Terminology preference. 'Cloud Drive' (reference) is the more idiomatic and standard term for cloud storage services compared to 'Cloud Disk'. | Terminology variation: hypothesis uses 'Disk' while reference uses 'Drive' | The hypothesis uses 'Disk' instead of 'Drive' as in the reference. |
| google/translategemma-4b-it | UC Cloud Storage |
轻微
[术语]
"UC Cloud Storage"
理由: While 'Cloud Storage' is semantically close, the official brand name for 'UC 网盘' is typically 'UC Cloud Drive' as indicated by the reference. 'Storage' is a slight deviation from the established proper noun. | Terminology inconsistency - 'UC Cloud Drive' is the standard translation for 'UC网盘' | Hypothesis 'UC Cloud Storage' translates '网盘' as 'Storage' but reference uses 'Drive' which is more accurate for this term. |
| tencent/HY-MT1.5-1.8B | UC网盘 |
轻微
[准确性]
"UC网盘"
理由: Target language is zh but reference shows the expected English branding “UC Cloud Drive”; leaving the Chinese term untranslated does not follow the intended translation direction. | Hypothesis is in Chinese (UC网盘) but reference is in English (UC Cloud Drive). Should be translated to English. | Direct use of source text 'UC网盘' instead of translating to 'UC Cloud Drive' | Wrong target language: The hypothesis copies the source Chinese text, but the target language is Chinese and the reference indicates the expected output is English ('UC Cloud Drive'). This suggests a failure to translate. |
| tencent/HY-MT1.5-7B | UC Cloud Disk |
轻微
[术语]
"UC Cloud Disk"
理由: The official English name for 'UC 网盘' is 'UC Cloud Drive', not 'Disk'. | Use of 'Disk' instead of 'Drive'. | Hypothesis uses 'Disk' but reference uses 'Drive'. Both are acceptable but 'Drive' is the standard terminology. | Slightly different term from reference (“Drive”); “Disk” vs “Drive” may affect branding consistency but meaning is understandable. | Terminology: 'Disk' vs reference 'Drive' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Submit your application |
致命
[准确性]
"Submit your application"
理由: Completely incorrect translation of '最新帖子' | Completely wrong translation: 'Submit your application' instead of 'Latest Posts' | The translation is semantically unrelated to the source ('最新帖子' means Latest Posts) and is in the wrong language (English instead of Chinese). | Meaning entirely different from source '最新帖子' ('Latest Posts'). | Hypothesis 'Submit your application' is completely incorrect translation of '最新帖子' (Latest Posts); unrelated content |
| CohereLabs/tiny-aya-water | Latest Posts | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Latest Posts | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Latest Posts | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Latest Posts | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Latest posts |
轻微
[风格]
"Latest posts"
理由: Capitalization differs from reference 'Latest Posts' | Inconsistent title casing compared to reference 'Latest Posts'. | Hypothesis uses lowercase 'posts' while reference capitalizes as 'Latest Posts'. Capitalization is inconsistent with reference. |
| Qwen/Qwen3.5-9B | Latest Posts | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Latest Post |
轻微
[准确性]
"Latest Post"
理由: Uses singular 'Post' while reference is plural 'Posts'; may not match functionality listing multiple posts. | Source is Chinese ('最新帖子'), target language is Chinese. Hypothesis is English. Complete failure to translate. | Missing plural form. | Number mismatch: hypothesis uses singular 'Post' while reference uses plural 'Posts' |
| google/gemma-3-12b-it | Latest Posts | 通过 (无共识错误) |
| google/gemma-3-1b-it | Latest Post |
轻微
[准确性]
"Latest Post"
理由: Singular form 'Post' used in hypothesis while reference uses plural 'Posts' | Number mismatch: singular 'Post' vs plural '帖子', which should align with 'Latest Posts' in this context. | Similar to Item 2, '最新帖子' in a forum context usually implies a list of 'Latest Posts'. Using the singular 'Post' suggests a single specific item rather than the section title. |
| google/gemma-3-4b-it | Latest Post |
轻微
[准确性]
"Latest Post"
理由: Incorrect number. '最新帖子' is plural, should be 'Latest Posts' not 'Latest Post' | The source '最新帖子' implies plurality ('Posts'), whereas the hypothesis uses the singular 'Post'. | 将复数“最新帖子”译为单数“Latest Post”,在列出多帖环境下可能误导。 |
| google/translategemma-12b-it | Latest Posts | 通过 (无共识错误) |
| google/translategemma-4b-it | Search |
致命
[准确性]
"Search"
理由: The source text '最新帖子' means 'Latest Posts', but the hypothesis translates it as 'Search', which is a different function entirely. | The translation is incorrect. 'Search' does not match the source '最新帖子' or the reference 'Latest Posts'. | Completely wrong translation - 'Search' does not mean '最新帖子' (Latest Posts) | Hypothesis 'Search' does not match source '最新帖子' (Latest Posts) or reference 'Latest Posts'. Completely incorrect translation. | Mistranslation: '最新帖子' means 'latest posts', not 'search'. |
| tencent/HY-MT1.5-1.8B | Translation of '最新帖子' in en |
致命
[准确性]
"Translation of '最新帖子' in en"
理由: Expected is a direct translation like “Latest Posts”; hypothesis is a meta-description about translation and not a usable translation of the term. | The hypothesis is a meta-commentary string instead of the actual translation. It fails to provide the target text. | Incorrect translation of '最新帖子' as 'Translation of '最新帖子' in en' | Mistranslation: hypothesis contains meta-text rather than actual translation; reference is 'Latest Posts' | Hypothesis is a meta-description rather than actual translation. Should be 'Latest Posts' not a description of translation process. |
| tencent/HY-MT1.5-7B | Latest Posts | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Douyin | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Douyin | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | TikTok | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | TikTok |
轻微
[准确性]
"TikTok"
理由: Source text '抖音' is Chinese, but hypothesis 'TikTok' is English. The translation is in the wrong direction. | Only the international name is given; reference includes both 'Douyin / TikTok'. Some loss of source-specific branding nuance. | Hypothesis provides only 'TikTok' while reference provides 'Douyin / TikTok'. The source '抖音' is the Chinese name (Douyin), and the reference indicates both names should be included. |
| Qwen/Qwen3-14B | Douyin | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Douyin |
轻微
[准确性]
"Douyin"
理由: Reference indicates dual branding “Douyin / TikTok”; using only “Douyin” omits the globally known name and is slightly less informative. | Hypothesis provides only 'Douyin' while reference includes alternative 'Douyin / TikTok'. The reference provides more complete information for international audiences. | Omission of alternative name 'TikTok' which is present in the reference and often required for international context. |
| Qwen/Qwen3.5-9B | Douyin | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | TikTok | 通过 (无共识错误) |
| google/gemma-3-12b-it | TikTok | 通过 (无共识错误) |
| google/gemma-3-1b-it | 抖音 |
严重
[准确性]
"抖音"
理由: Source Chinese term is not translated into the target English equivalent (e.g., 'Douyin'); it is simply copied from the source. | The source is Chinese, and the target language is English (implied by the reference 'Douyin / TikTok' and other items). The hypothesis failed to translate the proper noun, leaving it in Chinese characters. | Does not translate to 'Douyin / TikTok'. | Hypothesis is in Chinese while reference provides English translation 'Douyin / TikTok'; source should be translated |
| google/gemma-3-4b-it | Douyin | 通过 (无共识错误) |
| google/translategemma-12b-it | Douyin | 通过 (无共识错误) |
| google/translategemma-4b-it | Douyin | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 抖音 |
轻微
[准确性]
"抖音"
理由: Direct use of source text '抖音' instead of translating to 'Douyin / TikTok' | Wrong target language: The hypothesis copies the source Chinese text. Given the reference 'Douyin / TikTok', the expectation is an English translation/transliteration. | While “抖音” is correct in Chinese, reference indicates bilingual branding “Douyin / TikTok”; omitting the English name may be considered incomplete for the intended context. | Hypothesis is in Chinese (抖音) but reference is in English (Douyin / TikTok). Should be translated to English. |
| tencent/HY-MT1.5-7B | TikTok | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | LDLive | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | LDLive | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | LDLive | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | LDLive | 通过 (无共识错误) |
| Qwen/Qwen3-14B | LDLive | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | LDLive | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | LDLIVE | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Live LDL |
致命
[准确性]
"Live LDL"
理由: Mistranslation: 'Live LDL' is incorrect; should be 'LDLive' | Completely alters the product/name 'LDLive' into 'Live LDL', changing the brand string and meaning. | Source is a proper noun 'LDLive', target language is Chinese. Hypothesis is English and rearranges the letters incorrectly. Failure to translate/retain properly in target language. | Incorrect translation of 'LDLive'. | Incorrect translation: hypothesis reverses the order to 'Live LDL' while reference correctly maintains 'LDLive' as the proper name |
| google/gemma-3-12b-it | LDLive | 通过 (无共识错误) |
| google/gemma-3-1b-it | LDLive | 通过 (无共识错误) |
| google/gemma-3-4b-it | LDLive | 通过 (无共识错误) |
| google/translategemma-12b-it | LDLive | 通过 (无共识错误) |
| google/translategemma-4b-it | LDLive | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: Hypothesis 'Login' does not match source 'LDLive' or reference 'LDLive'. Should be a proper noun, not translated. | Incorrect translation of 'LDLive' as 'Login' | Mistranslation/Hallucination: 'LDLive' (a proper noun) was translated as 'Login'. Proper nouns should generally be preserved. | Mistranslation: 'Login' does not match source 'LDLive' or reference 'LDLive' | Source/brand is “LDLive”; “Login” is unrelated and changes the meaning from a proper name to an action. |
| tencent/HY-MT1.5-7B | LDLive | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Sisyphus's story has inspired countless works of art, literature, and philosophy. |
致命
[准确性]
"Sisyphus's story has inspired countless works of art, literature, and philosophy."
理由: Hypothesis provides extensive narrative about Sisyphus instead of simple translation 'Sisyphus'; completely over-expanded content | Major mistranslation: expanded explanation instead of the name 'Sisyphus' | Adds a full descriptive sentence instead of translating the proper name; meaning extends far beyond the source. | Incorrect expansion of '西西弗斯' | Hypothesis adds extensive content not present in the source '西西弗斯' (Sisyphus). The source is just a name, while the hypothesis is a full sentence explaining the impact of the story. This is a severe hallucination and accuracy failure. |
| CohereLabs/tiny-aya-water | Sisyphus | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Sisyphus | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Sisyphus | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Sisyphus | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Sisyphus | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Sisyphus | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Translation of '西西弗斯' in en |
致命
[准确性]
"Translation of '西西弗斯' in en"
理由: Invalid output: hypothesis provides meta-text about translation rather than the actual translation 'Sisyphus' | The hypothesis contains meta-text explaining the translation task instead of providing the actual translation. Also, it is in English, violating the target language constraint. | Meta description instead of actual translation; does not provide the required term 'Sisyphus'. | Should be 'Sisyphus'. | Placeholder text instead of actual translation; reference provides 'Sisyphus' |
| google/gemma-3-12b-it | Sisyphus | 通过 (无共识错误) |
| google/gemma-3-1b-it | Westfides |
致命
[准确性]
"Westfides"
理由: Mistranslation: 'Westfides' is incorrect; should be 'Sisyphus' | Mistranslation of the proper name '西西弗斯', which should be 'Sisyphus'; the given form is incorrect and misleading. | Incorrect translation: '西西弗斯' should be 'Sisyphus' not 'Westfides' | '西西弗斯' is the standard Chinese translation for the Greek mythological figure 'Sisyphus'. 'Westfides' is a hallucinated translation that incorrectly breaks down the characters ('West' for 西,'fides'??) and does not correspond to the actual entity. |
| google/gemma-3-4b-it | Sisyphus | 通过 (无共识错误) |
| google/translategemma-12b-it | Sisyphus | 通过 (无共识错误) |
| google/translategemma-4b-it | Sisyphus | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 西西弗斯 |
严重
[准确性]
"西西弗斯"
理由: Hypothesis is in Chinese (西西弗斯) but reference is in English (Sisyphus). Should be translated to English. | Wrong target language: The hypothesis copies the source Chinese text. The reference 'Sisyphus' indicates the target should be English. | Direct use of source text '西西弗斯' instead of translating to 'Sisyphus' | Source person/name “西西弗斯” is typically rendered as “Sisyphus” in English; leaving it in Chinese when English is expected does not match the reference direction. |
| tencent/HY-MT1.5-7B | Sisyphus | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Food | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Food |
轻微
[准确性]
"Food"
理由: Hypothesis provides only 'Food' but reference includes 'Food / Cuisine' offering a more complete translation of '美食'. | "美食" usually implies cuisine/gourmet food; "Food" is generic and loses the nuance but remains understandable. | The source text is Chinese ('美食'), but the hypothesis provides the English translation ('Food') instead of retaining the Chinese characters. The target language is specified as zh (Chinese). |
| Qwen/Qwen2.5-14B-Instruct | Delicacies |
轻微
[准确性]
"Delicacies"
理由: Hypothesis 'Delicacies' differs from reference 'Food / Cuisine'. These represent different semantic choices for '美食'. | “美食” refers broadly to food/cuisine; “Delicacies” narrows meaning to particularly exquisite or rare foods. | The source '美食' generally means 'food' or 'cuisine'. 'Delicacies' implies rare, expensive, or exquisitely prepared food, which is a narrower and more specific meaning than the general term '美食'. |
| Qwen/Qwen2.5-7B-Instruct | Cuisine |
轻微
[准确性]
"Cuisine"
理由: Source text '美食' is Chinese, but hypothesis 'Cuisine' is English. The translation is in the wrong direction. | Partial match; reference is 'Food / Cuisine' | Hypothesis provides only 'Cuisine' while reference provides 'Food / Cuisine'. The source '美食' can mean both, and reference indicates both options should be included. | 'Cuisine' emphasizes style of cooking rather than the broader sense of 'food' in 美食; partially covered by reference but slightly narrower. |
| Qwen/Qwen3-14B | Cuisine | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Gourmet food |
轻微
[准确性]
"Gourmet food"
理由: Hypothesis translates as 'Gourmet food' while reference uses 'Food / Cuisine'. The hypothesis is too narrow; '美食' is better translated as 'Food' or 'Cuisine' rather than specifically 'Gourmet food'. | “美食” is broadly “delicious food/foodie content”; “Gourmet food” narrows it to high-end cuisine, introducing a nuance not present in the source. | Source '美食' is a general category for food/cuisine; 'Gourmet food' is too specific and excludes general food content. |
| Qwen/Qwen3.5-9B | Delicious food | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Food | 通过 (无共识错误) |
| google/gemma-3-12b-it | Food | 通过 (无共识错误) |
| google/gemma-3-1b-it | Food | 通过 (无共识错误) |
| google/gemma-3-4b-it | Submit |
致命
[准确性]
"Submit"
理由: The source '美食' means 'Food' or 'Cuisine', but the hypothesis translates it as 'Submit', which is unrelated. | Translation is completely incorrect. '美食' means 'Food / Cuisine' not 'Submit' | “美食”应译为“Food/Cuisine”,而“Submit”是操作按钮,与内容无关。 | The translation does not match the source meaning. | Complete mistranslation - source means 'Food / Cuisine' but hypothesis is 'Submit' |
| google/translategemma-12b-it | Food |
轻微
[准确性]
"Food"
理由: "美食" emphasizes gourmet/culinary aspect; "Food / Cuisine" (reference) is slightly more specific than plain "Food". | Hypothesis provides only 'Food' but reference includes 'Food / Cuisine' to provide both a direct translation and a more specific alternative. | Incomplete translation. The source '美食' specifically refers to delicious food or cuisine. The reference suggests 'Cuisine' as a more precise alternative or addition to 'Food'. |
| google/translategemma-4b-it | Forgot Password |
致命
[准确性]
"Forgot Password"
理由: Hypothesis 'Forgot Password' does not match source '美食' (Food/Cuisine) or reference 'Food / Cuisine'. Completely incorrect translation. | Completely wrong translation - 'Forgot Password' does not mean '美食' (Food / Cuisine) | The source text '美食' means 'Food' or 'Cuisine', but the hypothesis translates it as 'Forgot Password', which is completely unrelated. | Mistranslation: '美食' refers to 'food/cuisine', unrelated to 'Forgot Password'. | The translation is incorrect. 'Forgot Password' does not match the source '美食' or the reference 'Food / Cuisine'. |
| tencent/HY-MT1.5-1.8B | Translation of '美食' in en |
致命
[准确性]
"Translation of '美食' in en"
理由: Incorrect translation of '美食' as 'Translation of '美食' in en' | Hypothesis is a meta-description rather than actual translation. Should be 'Food / Cuisine' not a description of translation process. | The hypothesis is a meta-commentary string instead of the actual translation. | Mistranslation: hypothesis contains meta-text rather than actual translation; reference is 'Food / Cuisine' | Should directly translate as “Food / Cuisine” or similar; instead, it is a meta-phrase about translation and not the translation itself. |
| tencent/HY-MT1.5-7B | Food and drinks |
轻微
[准确性]
"Food and drinks"
理由: Hypothesis 'Food and drinks' does not match reference 'Food / Cuisine'. The source '美食' means cuisine/delicious food, not drinks. | The source '美食' strictly means 'gourmet food' or 'cuisine'. Adding 'drinks' is an addition not present in the source text. | Adds “drinks,” which is not explicit in “美食”; minor over-translation, though often implied in context. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Oracle | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Oracle | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Oracle | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Oracle | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Oracle | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Oracle | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Oracle | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Oracle | 通过 (无共识错误) |
| google/gemma-3-12b-it | Oracle | 通过 (无共识错误) |
| google/gemma-3-1b-it | Oracle | 通过 (无共识错误) |
| google/gemma-3-4b-it | Cancel |
致命
[准确性]
"Cancel"
理由: Complete mistranslation - source is 'Oracle' but hypothesis is 'Cancel' | The source 'Oracle' is a proper noun (brand/database), but the hypothesis translates it as 'Cancel', which is completely wrong. | Translation is completely incorrect. 'Oracle' is a proper noun (company/database name) and should remain 'Oracle' not 'Cancel' | 专有名词“Oracle”被误译为操作“Cancel”,含义完全错误。 | The translation does not match the source meaning. |
| google/translategemma-12b-it | Oracle | 通过 (无共识错误) |
| google/translategemma-4b-it | YouTube |
致命
[准确性]
"YouTube"
理由: Hypothesis 'YouTube' does not match source 'Oracle' or reference 'Oracle'. Completely incorrect translation. | The source text 'Oracle' is a specific company/database name, but the hypothesis translates it as 'YouTube', a completely different entity. | The translation is incorrect. 'YouTube' does not match the source 'Oracle' or the reference 'Oracle'. | Mistranslation: 'Oracle' is a distinct product/company, not 'YouTube'. | Completely wrong translation - 'YouTube' does not mean 'Oracle' |
| tencent/HY-MT1.5-1.8B | Oracle | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Oracle | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | An error occurred during processing |
致命
[准确性]
"An error occurred during processing"
理由: Unrelated to the source idiom '扬帆起航' ('Setting Sail'), indicating a system error message instead of a translation. | Completely incorrect translation of '扬帆起航' | Completely wrong translation: error message instead of 'Setting Sail' | Hypothesis is an error message unrelated to the source '扬帆起航' (Setting Sail). Complete failure to translate; hallucinated error message. | Hypothesis 'An error occurred during processing' is completely incorrect translation of '扬帆起航' (Setting Sail); unrelated content |
| CohereLabs/tiny-aya-water | Set sail and embark on a journey | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | "en": "Set Sail", "zh-TW": " fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | 扬帆起航 |
严重
[准确性]
"扬帆起航"
理由: Source text '扬帆起航' is Chinese, and target language is Chinese. However, the hypothesis is identical to the source, indicating a failure to translate or adapt, while the reference provides an English translation 'Setting Sail'. Given the pattern of other items, this appears to be a case where the model failed to output the target language (if the prompt implies EN->ZH for mixed data) or failed to translate ZH->EN. Based on the item metadata 'Target Language: zh', the source is English? No, source is Chinese. If Target is ZH, and Source is ZH, no translation is needed unless it's a specific task. However, looking at the reference 'Setting Sail', the task was clearly ZH->EN. The metadata 'Target Language: zh' seems incorrect for the actual task direction implied by the reference, OR the model failed to translate ZH to ZH (tautology). Assuming the task is ZH->EN based on references, the error is wrong language. If the task is ZH->ZH, copying is fine. But given 'Setting Sail' reference, the expected output was English. Hypothesis is Chinese. Error: Wrong target language. | Hypothesis leaves source text untranslated in Chinese while reference provides English translation 'Setting Sail'. Target language is zh but this appears to be a translation task requiring English output. | Source text is untranslated; the hypothesis simply copies the Chinese phrase instead of rendering its meaning ('Setting Sail'). | No translation provided; reference is 'Setting Sail' |
| Qwen/Qwen3-14B | Set sail | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Set sail | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Set sail and start | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Raise Sail and Sailing |
严重
[准确性]
"Raise Sail and Sailing"
理由: Inaccurate translation: hypothesis is 'Raise Sail and Sailing' but reference is 'Setting Sail' - different meaning and awkward phrasing | Incorrect translation of '扬帆起航'. | Source is Chinese ('扬帆起航'), target language is Chinese. Hypothesis is English. Complete failure to translate. | Over-translation and awkward phrasing; reference is more concise 'Setting Sail' |
| google/gemma-3-12b-it | Set sail and embark on a journey |
轻微
[风格]
"Set sail and embark on a journey"
理由: Longer, more promotional phrasing than the concise “Setting Sail”; meaning is preserved but style diverges from the more succinct reference. | Style: overly literal translation vs concise 'Setting Sail' | The hypothesis is overly verbose compared to the concise source '扬帆起航' and the reference 'Setting Sail'. It adds 'embark on a journey' which is implied but not explicitly stated, affecting brevity. |
| google/gemma-3-1b-it | 扬帆起航 - Translation of '扬帆起航' in en |
严重
[流畅性]
"扬帆起航 - Translation of '扬帆起航' in en"
理由: Includes untranslated Chinese and meta-explanatory text, not a proper concise translation suitable for use as a label. | Unnecessary addition of 'Translation of '扬帆起航' in en'. | Unnecessary meta-commentary in translation |
| google/gemma-3-4b-it | Confirm |
致命
[准确性]
"Confirm"
理由: Translation is completely incorrect. '扬帆起航' means 'Setting Sail' not 'Confirm' | Mistranslation - source means 'Setting Sail' but hypothesis is 'Confirm' | The translation does not match the source meaning. | “扬帆起航”意为“Setting Sail/Set Sail”,与“Confirm(确认)”完全不符。 | The source '扬帆起航' is an idiom meaning 'Setting Sail' or 'Embarking on a journey', but the hypothesis translates it as 'Confirm'. |
| google/translategemma-12b-it | Set sail and embark on a journey |
轻微
[风格]
"Set sail and embark on a journey"
理由: Style difference: hypothesis is more verbose than reference 'Setting Sail' | Over-translation. The hypothesis is significantly more verbose than the concise source '扬帆起航' and the reference 'Setting Sail', adding unnecessary words ('embark on a journey'). | The hypothesis is more verbose and figurative than the concise phrase "Setting Sail" in the reference; meaning is preserved but style diverges. |
| google/translategemma-4b-it | Reload |
致命
[准确性]
"Reload"
理由: Hypothesis 'Reload' does not match source '扬帆起航' (Setting Sail) or reference 'Setting Sail'. Completely incorrect translation. | Completely wrong translation - 'Reload' does not mean '扬帆起航' (Setting Sail) | Mistranslation: '扬帆起航' means 'set sail' metaphorically, not 'reload'. | The source text '扬帆起航' is an idiom meaning 'Setting Sail' or 'Embarking on a journey', but the hypothesis translates it as 'Reload', which is unrelated. | The translation is incorrect. 'Reload' does not match the source '扬帆起航' or the reference 'Setting Sail'. |
| tencent/HY-MT1.5-1.8B | Yáng Fān Qǐ Háng |
严重
[准确性]
"Yáng Fān Qǐ Háng"
理由: Direct use of pinyin 'Yáng Fān Qǐ Háng' instead of translating to 'Setting Sail' | Hypothesis is pinyin romanization but reference is proper English translation 'Setting Sail'. Should translate meaning, not transliterate. | The hypothesis is only a pinyin transliteration; expected is a semantic translation like “Setting Sail,” so the meaning is not conveyed. | Incorrect translation strategy: The idiom was transliterated into Pinyin instead of being translated into English ('Setting Sail') as per the reference and target language expectation. | Mistranslation: pinyin transliteration does not match reference 'Setting Sail' |
| tencent/HY-MT1.5-7B | Set sail |
轻微
[风格]
"Set sail"
理由: The source '扬帆起航' is a noun phrase or gerund concept in this context. The reference 'Setting Sail' matches this nominal style better than the imperative/verb phrase 'Set sail'. | Hypothesis uses 'Set sail' but reference uses 'Setting Sail' with gerund form and capitalization. | Style: 'Set sail' vs 'Setting Sail', different grammatical form |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Customer Reviews |
致命
[准确性]
"Customer Reviews"
理由: Hypothesis 'Customer Reviews' is completely incorrect translation of '高级推广' (Advanced Promotions); unrelated content | Hypothesis 'Customer Reviews' does not match source '高级推广' (Advanced Promotions). Completely wrong meaning; severe accuracy error. | Mistranslation; '高级推广' refers to advanced promotion/marketing, not customer reviews. | Completely wrong translation: 'Customer Reviews' instead of 'Advanced Promotions' | Completely incorrect translation of '高级推广' |
| CohereLabs/tiny-aya-water | Advanced Promotion | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: The source '高级推广' likely refers to a feature or category which is often pluralized in English UIs ('Promotions') to indicate various types of advanced promotional tools. Singular 'Promotion' might imply a single instance. | Hypothesis uses singular 'Promotion' while reference uses plural 'Promotions'. Number mismatch. | Singular form differs from reference plural 'Advanced Promotions' | “高级推广” in many UI/business contexts is understood as a feature name often rendered as “Advanced Promotions”; singular vs plural is a slight mismatch. |
| Qwen/Qwen2.5-7B-Instruct | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: Singular form; reference is 'Advanced Promotions' (plural) | Singular 'Promotion' vs plural 'Promotions' in reference; may underrepresent multiple promotional activities. | Source text '高级推广' is Chinese, but hypothesis 'Advanced Promotion' is English. The translation is in the wrong direction. | Hypothesis uses singular 'Promotion' while reference uses plural 'Promotions'. The source '高级推广' can be singular or plural, but reference indicates plural form is preferred. |
| Qwen/Qwen3-14B | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: Source "高级推广" is typically plural in this UI/business context; reference uses "Advanced Promotions" rather than singular. | Singular instead of plural 'Advanced Promotions' | The source '高级推广' likely refers to a category of features, making the plural 'Promotions' (reference) more accurate than the singular. | Hypothesis uses singular 'Promotion' while reference uses plural 'Promotions'. The plural form is more appropriate for the context. |
| Qwen/Qwen3.5-4B | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: Source “高级推广” is typically plural in this UI/business context; “Advanced Promotion” (singular) is slightly less aligned with expected meaning than “Advanced Promotions.” | Singular differs from reference plural 'Advanced Promotions' | Hypothesis uses singular 'Promotion' while reference uses plural 'Promotions'. The plural form is more appropriate for '高级推广'. | Source implies a category of promotions; reference uses plural 'Promotions'. |
| Qwen/Qwen3.5-9B | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: Singular form used while reference is plural 'Promotions' | Hypothesis uses singular 'Promotion', but reference uses plural 'Promotions'. Number agreement issue. | The source and reference use a plural sense (“高级推广” / “Advanced Promotions”), while the hypothesis uses singular, creating a small mismatch in number. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: Singular form differs from reference 'Advanced Promotions'; may underrepresent multiple promotion options. | Missing plural form. | Number mismatch: hypothesis uses singular 'Promotion' while reference uses plural 'Promotions' | Source is Chinese ('高级推广'), target language is Chinese. Hypothesis is English. Complete failure to translate. |
| google/gemma-3-12b-it | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: “高级推广” likely refers to a feature/category with multiple promotions; the plural “Promotions” in the reference better reflects this. Singular is slightly less accurate. | The source '高级推广' likely refers to a category of features or multiple promotion types, aligning with the plural reference 'Advanced Promotions'. The singular 'Promotion' might be less accurate for a menu item representing a group. | Number mismatch. Should be 'Advanced Promotions' (plural) to match reference. | Number mismatch: singular 'Promotion' vs plural 'Promotions' |
| google/gemma-3-1b-it | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: While 'Promotion' can be uncountable, in the context of UI features or marketing tools (implied by '高级推广'), the plural 'Promotions' is often preferred to denote the section or multiple types of promotions. The singular is acceptable but slightly less precise than the reference. | Singular form 'Promotion' used in hypothesis while reference uses plural 'Promotions' | Number mismatch with reference 'Advanced Promotions'; singular may not fully reflect the intended category/plural sense. |
| google/gemma-3-4b-it | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: Incorrect number. '高级推广' is plural, should be 'Advanced Promotions' not 'Advanced Promotion' | The source '高级推广' often implies a category of multiple items or a general concept, better rendered as the plural 'Advanced Promotions' per the reference, though singular is not strictly ungrammatical. | 源文“高级推广”通常指多种推广活动,“Advanced Promotions”更自然;单数略偏离参考含义但仍可理解。 |
| google/translategemma-12b-it | Advanced Promotion |
轻微
[准确性]
"Advanced Promotion"
理由: Number mismatch: hypothesis uses singular 'Promotion' while reference uses plural 'Promotions' | Hypothesis uses singular 'Promotion' but reference uses plural 'Promotions'. Source '高级推广' should be plural. | Source "高级推广" likely refers to a feature/category; reference uses plural "Advanced Promotions" suggesting multiple promotion options. Singular may under-specify the concept. | Number mismatch. The reference 'Advanced Promotions' uses the plural form, which is standard for category labels encompassing multiple promotional activities. |
| google/translategemma-4b-it | Promotions |
轻微
[准确性]
"Promotions"
理由: The translation is less specific than the reference. | Hypothesis 'Promotions' omits the 'Advanced' modifier from source '高级推广'. Reference 'Advanced Promotions' is more complete and accurate. | The source text '高级推广' specifically means 'Advanced Promotions'. The hypothesis 'Promotions' omits the modifier 'Advanced' (高级), losing specific meaning. | Missing nuance: '高级推广' is 'advanced promotions'; hypothesis omits the 'advanced' qualifier. |
| tencent/HY-MT1.5-1.8B | 高级推广 |
轻微
[准确性]
"高级推广"
理由: Wrong target language: The hypothesis copies the source Chinese text. The reference 'Advanced Promotions' indicates the target should be English. | Target language expected is English (“Advanced Promotions”); hypothesis simply copies the Chinese source and does not translate. | Hypothesis is in Chinese (高级推广) but reference is in English (Advanced Promotions). Should be translated to English. | Direct use of source text '高级推广' instead of translating to 'Advanced Promotions' |
| tencent/HY-MT1.5-7B | Advanced Promotion | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Adobe | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Adobe | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Adobe | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Adobe | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Adobe | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Adobe | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Adobe | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Adobe | 通过 (无共识错误) |
| google/gemma-3-12b-it | Adobe | 通过 (无共识错误) |
| google/gemma-3-1b-it | Adobe | 通过 (无共识错误) |
| google/gemma-3-4b-it | Update |
致命
[准确性]
"Update"
理由: Translation is completely incorrect. 'Adobe' is a proper noun (company name) and should remain 'Adobe' not 'Update' | The source 'Adobe' is a brand name, but the hypothesis translates it as 'Update', which is unrelated. | Complete mistranslation - source is 'Adobe' but hypothesis is 'Update' | 品牌名“Adobe”被误译为“Update”,完全改变实体。 | The translation does not match the source meaning. |
| google/translategemma-12b-it | Adobe | 通过 (无共识错误) |
| google/translategemma-4b-it | Adobe | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Adobe | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Adobe | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Forum Level |
轻微
[准确性]
"Forum Level"
理由: Partially translated; omits the nuance of 'trust' present in 'Forum Trust Levels' but still related to levels in the forum. | Source is Chinese '论坛等级', target language is zh, but hypothesis outputs English 'Forum Level'. While the meaning is close to reference 'Forum Trust Levels', the primary error is failure to output in Chinese. Also, 'Level' vs 'Trust Levels' is a slight terminology nuance, but the main issue is language mismatch. | Hypothesis 'Forum Level' is incomplete; reference is 'Forum Trust Levels' which better captures '论坛等级' |
| CohereLabs/tiny-aya-water | Forum Level | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Forum Level |
轻微
[术语]
"Forum Level"
理由: Term 'Forum Level' differs from reference 'Forum Trust Levels' - different semantic meaning | The translation uses 'Forum Level' instead of 'Forum Trust Levels', which is a slight variation but still accurate. | The reference 'Forum Trust Levels' suggests a specific system (like Discourse) where 'Level' specifically refers to 'Trust Levels'. 'Forum Level' is generic and might miss the specific terminology expected in this context. |
| Qwen/Qwen2.5-7B-Instruct | Forum Level |
严重
[准确性]
"Forum Level"
理由: Hypothesis translates as 'Forum Level' while reference uses 'Forum Trust Levels'. The meaning is significantly different; '论坛等级' refers to trust levels, not just generic levels. | Source text '论坛等级' is Chinese, but hypothesis 'Forum Level' is English. The translation is in the wrong direction. | Missing 'Trust' in translation | More generic than 'Forum Trust Levels' and omits the 'trust' nuance present in the reference. |
| Qwen/Qwen3-14B | Forum Level |
轻微
[准确性]
"Forum Level"
理由: Hypothesis translates as 'Forum Level' but reference is 'Forum Trust Levels'. The source '论坛等级' should include 'Trust' concept and use plural form. Significant semantic difference. | The translation is less detailed than the reference. | The intended concept per reference is "Forum Trust Levels", specifically indicating trust levels, not generic levels. | The translation 'Forum Level' is imprecise; the specific system terminology is 'Forum Trust Levels' as indicated by the reference. |
| Qwen/Qwen3.5-4B | Forum Rank |
严重
[准确性]
"Forum Rank"
理由: Hypothesis translates as 'Forum Rank' while reference uses 'Forum Trust Levels'. These convey different meanings; '论坛等级' is better translated as 'Trust Levels' rather than 'Rank'. | The translation does not fully capture the meaning of the reference, which refers to trust levels. | “论坛等级” as used in Discourse-like contexts refers to trust levels, not generic rank; “Forum Rank” loses the specific concept of “trust levels.” |
| Qwen/Qwen3.5-9B | Forum Level |
轻微
[准确性]
"Forum Level"
理由: The translation misses the specific meaning of '等级' in this context. The reference 'Forum Trust Levels' indicates a specific system (likely Discourse) where 'Trust Levels' is the correct terminology. 'Forum Level' is vague and potentially misleading. | The translation does not include the word 'Trust' which is present in the reference. | Hypothesis 'Forum Level' is incomplete. Reference 'Forum Trust Levels' provides the full and correct meaning of '论坛等级'. | The hypothesis omits “Trust”, translating only as “Forum Level”, whereas the reference specifies “Forum Trust Levels”, which is more precise for common forum terminology. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Forum Level |
严重
[准确性]
"Forum Level"
理由: Missing 'Trust' in the translation. | Source is Chinese ('论坛等级'), target language is Chinese. Hypothesis is English. Complete failure to translate. | Misses 'Trust' present in reference 'Forum Trust Levels', which is important to the specific Discourse concept of trust levels. | Inaccurate translation: hypothesis is 'Forum Level' but reference is 'Forum Trust Levels' - missing 'Trust' and number mismatch |
| google/gemma-3-12b-it | Forum Level |
轻微
[准确性]
"Forum Level"
理由: The hypothesis omits the word 'Trust' which is present in the reference. | Incorrect translation. Should be 'Forum Trust Levels' not 'Forum Level' to match reference. | “论坛等级” in the given context corresponds to “Forum Trust Levels”; “Forum Level” is more generic and misses the specific “trust levels” concept. |
| google/gemma-3-1b-it | Forum Level |
轻微
[术语]
"Forum Level"
理由: Incomplete translation: 'Level' vs 'Trust Levels' | In forum software contexts (like Discourse, implied by 'Trust Levels' in reference), '等级' specifically refers to 'Trust Levels'. 'Forum Level' is a literal translation that lacks the specific domain terminology. | Should be 'Forum Trust Levels'. |
| google/gemma-3-4b-it | Forum Level |
严重
[术语]
"Forum Level"
理由: Reference uses 'Trust Levels' but hypothesis uses 'Forum Level' - different terminology | The specific term for '论坛等级' in this context (likely Discourse-based) is 'Forum Trust Levels'. 'Forum Level' loses the specific nuance. | The translation omits 'Trust' from the reference. |
| google/translategemma-12b-it | Forum Level |
严重
[准确性]
"Forum Level"
理由: Hypothesis 'Forum Level' does not match reference 'Forum Trust Levels'. The source '论坛等级' should be translated as 'Trust Levels' not just 'Level'. | Loss of specific meaning. The source '论坛等级' in community software contexts specifically refers to 'Trust Levels' (a Discourse concept), not just a generic 'Level'. This changes the technical meaning. | "论坛等级" in this context refers to the trust level system; "Forum Trust Levels" in the reference is more specific than the generic "Forum Level". |
| google/translategemma-4b-it | Message |
致命
[准确性]
"Message"
理由: The source text '论坛等级' means 'Forum Trust Levels' or 'Forum Ranks', but the hypothesis translates it as 'Message', which is incorrect. | The translation is incorrect. 'Message' does not match the source '论坛等级' or the reference 'Forum Trust Levels'. | Mistranslation: '论坛等级' refers to 'forum levels/trust levels', not 'message'. | Completely wrong translation - 'Message' does not mean '论坛等级' (Forum Trust Levels) | Hypothesis 'Message' does not match source '论坛等级' (Forum Trust Levels) or reference 'Forum Trust Levels'. Completely incorrect translation. |
| tencent/HY-MT1.5-1.8B | Translation of '论坛等级' in en |
致命
[准确性]
"Translation of '论坛等级' in en"
理由: Hypothesis is a meta-description rather than actual translation. Should be 'Forum Trust Levels' not a description of translation process. | Incorrect translation of '论坛等级' as 'Translation of '论坛等级' in en' | Mistranslation: hypothesis contains meta-text rather than actual translation; reference is 'Forum Trust Levels' | The hypothesis is a meta-commentary string instead of the actual translation. | Expected is a phrase like “Forum Trust Levels”; hypothesis is merely a meta-description and not an actual translation. |
| tencent/HY-MT1.5-7B | Forum Level |
轻微
[准确性]
"Forum Level"
理由: Hypothesis 'Forum Level' does not match reference 'Forum Trust Levels'. Missing 'Trust' and plural form. | Partially translated; missing “Trust” present in the reference (“Forum Trust Levels”). | Missing 'Trust' in the translation of '论坛等级' as 'Forum Level' instead of 'Forum Trust Levels'. | The source '论坛等级' in the context of forum software (implied by reference) specifically refers to 'Trust Levels', not just a generic 'Level'. The hypothesis misses the specific terminology. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | ESA | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | ESA | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | ESA | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | ESA | 通过 (无共识错误) |
| Qwen/Qwen3-14B | ESA | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | ESA | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | ESA | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | English translation of ESA |
致命
[准确性]
"English translation of ESA"
理由: Meta-text indicating a translation is needed instead of giving the actual term 'ESA' as in the reference. | Invalid output: hypothesis provides meta-text about translation rather than the actual acronym 'ESA' | The hypothesis contains meta-text describing the task instead of the translation. It is also in English, violating the target language constraint. | Should be 'ESA'. | Placeholder text instead of actual translation; reference provides 'ESA' |
| google/gemma-3-12b-it | ESA | 通过 (无共识错误) |
| google/gemma-3-1b-it | ESA | 通过 (无共识错误) |
| google/gemma-3-4b-it | OK |
致命
[准确性]
"OK"
理由: The source 'ESA' is an acronym (likely European Space Agency or similar), but the hypothesis translates it as 'OK', which is a complete hallucination. | Translation is completely incorrect. 'ESA' is an acronym that should remain as 'ESA' not be translated to 'OK' | Mistranslation - source is 'ESA' but hypothesis is 'OK' | “ESA”是专有名词缩写,应保留;译为“OK”完全改变含义且遗漏信息。 |
| google/translategemma-12b-it | ESA | 通过 (无共识错误) |
| google/translategemma-4b-it | European Space Agency | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: “ESA” is a proper name/acronym; “登录” (log in) is unrelated and incorrect. | Hallucination: 'ESA' was translated as '登录' (Login), which is completely unrelated. Acronyms should typically be preserved or expanded correctly. | Hypothesis '登录' (login) does not match source 'ESA' or reference 'ESA'. Completely incorrect translation. | Mistranslation: '登录' (login) does not match source 'ESA' or reference 'ESA' | Incorrect translation of 'ESA' as '登录' (login) |
| tencent/HY-MT1.5-7B | European Space Agency | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | AppleID | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Apple ID | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Apple ID | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | AppleID | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Apple ID | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Apple ID | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Apple ID | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Apple ID | 通过 (无共识错误) |
| google/gemma-3-12b-it | Apple ID | 通过 (无共识错误) |
| google/gemma-3-1b-it | Apple ID | 通过 (无共识错误) |
| google/gemma-3-4b-it | Apple ID | 通过 (无共识错误) |
| google/translategemma-12b-it | Apple ID | 通过 (无共识错误) |
| google/translategemma-4b-it | Skype |
致命
[准确性]
"Skype"
理由: Mistranslation of product/service name; does not correspond to 'AppleID'. | The source term 'AppleID' is translated as 'Skype', which is a completely different entity, resulting in a total loss of meaning. | Mistranslation: 'AppleID' translated to 'Skype' instead of 'Apple ID' | The translation is incorrect; 'Skype' does not correspond to 'AppleID'. | Completely incorrect translation. 'AppleID' should be translated as 'Apple ID', not 'Skype' |
| tencent/HY-MT1.5-1.8B | "en": "AppleID的翻译是\"Apple ID\"", "zh-TW": "AppleID的翻译是\'Apple ID\'", "ja": "AppleIDの翻訳は\'Apple ID\'です", "ko": "AppleID의 번역은\'Apple ID\'입니다", "de": "AppleID的翻译是\'Apple ID\'", "fr": "AppleID的翻译是\'Apple ID\'", "it": "AppleID的翻译是\'Apple ID\'" } |
致命
[准确性]
""en": "AppleID的翻译是"Apple ID"", "zh-TW": "AppleID的翻译是'Apple ID'", "ja": "AppleIDの翻訳は'Apple ID'です", "ko": "AppleID의 번역은'Apple ID'입니다", "de": "AppleID的翻译是'Apple ID'", "fr": "AppleID的翻译是'Apple ID'", "it": "AppleID的翻译是'Apple ID'" }"
理由: The hypothesis is a meta-commentary string containing multiple languages and explanation text rather than a direct translation of the source term "AppleID" into the target language Chinese, and does not match the reference term formatting "Apple ID". | Missing translation for target language zh. | The hypothesis contains a JSON object with translations into multiple languages and meta-text instead of the required Chinese translation for the source term 'AppleID'. It completely fails to provide the target language output. | Incorrect translation format - contains English text with Chinese characters mixed incorrectly |
| tencent/HY-MT1.5-7B | Apple ID | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | AliExpress International |
致命
[准确性]
"AliExpress International"
理由: Incorrect translation of 'QQ音乐' | Completely incorrect translation; unrelated brand and meaning compared to 'QQ音乐'/'QQ Music'. | Complete hallucination; the source refers to 'QQ Music', but the hypothesis translates to a completely different entity. | Completely incorrect translation; source is 'QQ音乐' (QQ Music), not AliExpress | Completely wrong translation - source is QQ音乐 but hypothesis is AliExpress International |
| CohereLabs/tiny-aya-water | QQ Music | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | QQ Music | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | QQ Music | 通过 (无共识错误) |
| Qwen/Qwen3-14B | QQ Music | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | QQ Music | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | QQ Music | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | QQ Music | 通过 (无共识错误) |
| google/gemma-3-12b-it | QQ Music | 通过 (无共识错误) |
| google/gemma-3-1b-it | QQ Music | 通过 (无共识错误) |
| google/gemma-3-4b-it | QQ Music | 通过 (无共识错误) |
| google/translategemma-12b-it | QQ Music | 通过 (无共识错误) |
| google/translategemma-4b-it | Channel |
致命
[准确性]
"Channel"
理由: Mistranslation; unrelated to the source term 'QQ音乐' (QQ Music). | Mistranslation: 'QQ音乐' (QQ Music) translated to 'Channel' instead of 'QQ Music' | The source term 'QQ 音乐' (QQ Music) is translated as 'Channel', which is unrelated to the specific brand and product mentioned. | The translation is incorrect; 'Channel' does not correspond to 'QQ音乐'. | Completely incorrect translation. 'QQ音乐' should be 'QQ Music', not 'Channel' |
| tencent/HY-MT1.5-1.8B | QQ Music | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | QQ Music | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Open Code | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Open Code |
轻微
[术语]
"Open Code"
理由: Incorrect spacing for the proper noun 'opencode'; reference uses 'OpenCode'. | Incorrect spacing; should be 'OpenCode'. | Mistranslation of product/name; reference uses branded form 'OpenCode' without space. |
| Qwen/Qwen2.5-14B-Instruct | Open Code | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | opencode | 通过 (无共识错误) |
| Qwen/Qwen3-14B | opencode | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | opencode |
轻微
[风格]
"opencode"
理由: Incorrect capitalization; should be 'OpenCode' with capital O and C | Brand/product name not capitalized as in reference; stylistic inconsistency with expected form "OpenCode". | Incorrect capitalization of the proper noun; should be 'OpenCode'. |
| Qwen/Qwen3.5-9B | OpenCode | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | opencode |
轻微
[风格]
"opencode"
理由: Capitalization mismatch: hypothesis uses lowercase 'opencode' while reference uses 'OpenCode' | Incorrect capitalization; should be 'OpenCode' with capital O and C | Incorrect capitalization for the proper noun/brand name; should be 'OpenCode'. |
| google/gemma-3-12b-it | opencode |
轻微
[风格]
"opencode"
理由: Incorrect capitalization; should be 'OpenCode' with capital O and C | Capitalization difference: 'opencode' vs 'OpenCode' | Brand/product name is not capitalized as in the reference; stylistic consistency issue. | Incorrect capitalization of the proper noun; should be 'OpenCode'. |
| google/gemma-3-1b-it | opencode | 通过 (无共识错误) |
| google/gemma-3-4b-it | Open Code |
轻微
[准确性]
"Open Code"
理由: Casing mismatch: source is lowercase 'opencode', reference is 'OpenCode', hypothesis is 'Open Code' with space | Incorrect spacing in the proper noun; reference uses 'OpenCode'. | Hypothesis uses two words 'Open Code' while reference uses single word 'OpenCode' as proper noun/product name |
| google/translategemma-12b-it | opencode |
轻微
[术语]
"opencode"
理由: Brand/stylized name casing does not match expected form "OpenCode"; should preserve correct capitalization. | Incorrect capitalization; should be 'OpenCode' with capital O and C | Should be 'OpenCode' |
| google/translategemma-4b-it | Copy |
致命
[准确性]
"Copy"
理由: The translation is incorrect; 'Copy' does not correspond to 'opencode'. | Mistranslation; 'Copy' does not reflect the product name 'opencode'. | Completely incorrect translation. 'opencode' should be 'OpenCode', not 'Copy' | Mistranslation: 'opencode' translated to 'Copy' instead of 'OpenCode' | The source term 'opencode' is translated as 'Copy', which is semantically incorrect and unrelated to the source. |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Completely wrong translation - 'login' instead of 'OpenCode' | Incorrect translation. Should be 'OpenCode'. | The hypothesis '登录' (Login) is completely unrelated to the source 'opencode'. This appears to be a hallucination or a system error inserting a default string. | Hypothesis '登录' (login) does not match source 'opencode' or reference 'OpenCode'. Completely incorrect translation. | Mistranslation: "登录" means "log in" and is unrelated to the source term "opencode" and the reference "OpenCode". |
| tencent/HY-MT1.5-7B | Open code | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | LDC Mini Store |
轻微
[准确性]
"LDC Mini Store"
理由: '小店' is closer to 'shop' or 'store' without implying 'mini'; addition of 'Mini' slightly changes nuance. | Inaccurate translation of '小店'; should be 'Shop' not 'Mini Store' | Translation differs from reference 'LDC Shop' - 'Mini Store' vs 'Shop' |
| CohereLabs/tiny-aya-water | LDC Mini Store |
轻微
[准确性]
"Mini Store"
理由: Incorrect translation: 'Mini Store' vs reference 'Shop' | Hypothesis translates '小店' as 'Mini Store' but reference uses 'Shop' which is more accurate and concise | Less accurate than 'LDC Shop'. | Source and reference imply a generic 'shop/store'; 'Mini Store' adds unwarranted nuance of size. |
| Qwen/Qwen2.5-14B-Instruct | LDC Small Store |
轻微
[术语]
"Small Store"
理由: Terminology preference: reference uses 'Shop' not 'Small Store' | The term 'Small Store' is less precise than 'Shop'. | Incorrect translation. '小店' should be translated as 'Shop' not 'Small Store'. |
| Qwen/Qwen2.5-7B-Instruct | LDC Store |
轻微
[术语]
"LDC Store"
理由: Incorrect term; should be 'Shop' not 'Store' | The translation uses 'Store' while the reference uses 'Shop'. | Word choice: 'Store' vs reference 'Shop' | "Store" is a reasonable translation but differs from reference term "Shop"; potential terminology inconsistency. | Inconsistent terminology choice compared to reference ('Shop' vs 'Store'), though both are semantically close. |
| Qwen/Qwen3-14B | LDC Store |
轻微
[术语]
"Store"
理由: Incorrect word choice; should be 'Shop' not 'Store' | Term choice: 'Store' vs reference 'Shop' - both acceptable but different from reference | Inconsistent terminology with the reference 'Shop'; while 'Store' is a valid translation, consistency with established brand naming is preferred. |
| Qwen/Qwen3.5-4B | LDC Small Shop | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | "en": "LDC Small Shop", "zh-TW": "LDC小店", " "ja": "LDC小売店", "ko": "LDC 작은 가게", "de": "LDC Kleiner Laden", "fr": "Petite boutique LDC", "it": "Piccolo negozio LDC" } |
致命
[准确性]
""en": "LDC Small Shop",
"zh-TW": "LDC小店",
" "ja": "LDC小売店",
"ko": "LDC 작은 가게",
"de": "LDC Kleiner Laden",
"fr": "Petite boutique LDC",
"it": "Piccolo negozio LDC"
}"
理由: Hypothesis returns a multilingual JSON-like object instead of a single Chinese translation; includes extra languages, keys, and trailing brace not present in source or reference, and mistranslates the phrase compared to reference "LDC Shop". | The hypothesis contains raw JSON code with multiple languages and formatting artifacts instead of the required single translation string. | Hypothesis contains JSON object instead of translation; does not match reference 'LDC Shop' | Missing target language translation. | Hypothesis contains malformed JSON with multiple language translations instead of a single English translation. Should be 'LDC Shop' only. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | LDC Small Store | 通过 (无共识错误) |
| google/gemma-3-12b-it | LDC Store |
轻微
[术语]
"Store"
理由: Incorrect word choice; should be 'Shop' not 'Store' | Inconsistent terminology with the reference 'Shop'; while both are valid, consistency is preferred in UI contexts. | Term choice: 'Store' vs 'Shop' - both acceptable translations but reference uses 'Shop' |
| google/gemma-3-1b-it | LDC Small Shop |
严重
[准确性]
"LDC Small Shop"
理由: Unnecessary addition of 'Small': reference is 'LDC Shop' | Mistranslation of the established name; reference omits “Small” and uses “LDC Shop” as the proper rendering. | Over-translation; '小店' should be translated as 'Shop' not 'Small Shop' |
| google/gemma-3-4b-it | LDC Mini Shop |
轻微
[准确性]
"LDC Mini Shop"
理由: Added 'Mini' which is not present in the source '小店' contextually or the reference 'Shop'. | The translation adds 'Mini', which is not present in the source or reference. | Hypothesis translates '小店' as 'Mini Shop' but reference translates it as 'Shop', omitting the 'Mini' descriptor | '小店' refers to a 'shop' or 'store'; adding 'Mini' changes the meaning and is not supported by the source or reference. | Unnecessary addition of 'Mini' not present in reference 'LDC Shop' |
| google/translategemma-12b-it | LDC Small Store |
轻微
[术语]
"LDC Small Store"
理由: Source likely refers to a named shop; "Small Store" adds nuance and differs from reference term "Shop". | Should be 'LDC Shop' | Term inconsistency: 'Small Store' vs 'Shop' | Inaccurate translation; '小店' should be translated as 'Shop' not 'Small Store' |
| google/translategemma-4b-it | LDC Mini Shop |
轻微
[准确性]
"LDC Mini Shop"
理由: Over-translation: 'LDC小店' (LDC Shop) translated as 'LDC Mini Shop', adding 'Mini' not in source | '小店' is translated as 'Mini Shop' instead of the more neutral 'Shop'; slight change of nuance but generally understandable. | Inaccurate translation. 'LDC小店' should be 'LDC Shop', not 'LDC Mini Shop'. The word '小' is omitted in the reference. | The source '小店' implies a small shop, but the reference 'LDC Shop' suggests 'Mini' is unnecessary or stylistically inconsistent with the established term. |
| tencent/HY-MT1.5-1.8B | LDC小店 |
致命
[准确性]
"LDC小店"
理由: Hypothesis is in Chinese but target language is zh and reference is in English 'LDC Shop'. Should be translated to English or appropriate target language. | Missing translation for target language zh. | The hypothesis repeats the source text. While 'LDC' is a proper noun, '小店' (small shop) should ideally be translated to 'Shop' or 'Store' to match the reference style and ensure clarity in a Chinese-to-English context, although the target language is zh, the reference suggests the expected output was English or a specific localized name. However, given Target Language is zh, repeating source is usually acceptable unless the source is English. Wait, Source is 'LDC 小店' (Chinese mixed), Target is zh. Reference is 'LDC Shop' (English). This implies the task might be Chinese to English despite the 'Target Language: zh' label, OR the reference is wrong. If Target is zh, 'LDC 小店' is correct. If the task is actually zh->en based on references like Item 2, 5, 6, then this is a critical error. Looking at Item 2: Source 'QQ 音乐', Ref 'QQ Music' (En). Item 6: Source '恋爱', Ref 'Romance' (En). The 'Target Language: zh' label in the prompt seems incorrect for the data pattern; the data pattern is Source (zh) -> Target (en). Assuming the task is actually translation to English based on the references provided for all other items. |
| tencent/HY-MT1.5-7B | LDC Small Shop |
轻微
[准确性]
"Small Shop"
理由: Hypothesis translates '小店' as 'Small Shop' but reference uses 'Shop' which is more concise and appropriate for the product name | Over-translation of '小店'; 'Shop' is sufficient and more idiomatic in this context. | Unnecessary addition of 'Small' when reference is 'LDC Shop' | '小店' in this context is rendered as 'Shop' in the reference; 'Small Shop' adds an unwarranted size attribute not present in the source. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | User support and documentation |
致命
[准确性]
"User support and documentation"
理由: Complete hallucination; the source means 'Soft Router', but the hypothesis provides unrelated text about support. | Completely incorrect translation; source is '软路由' (Soft Routers), not user support | Completely wrong translation - source is 软路由 but hypothesis is unrelated text | Totally unrelated to '软路由' which refers to 'soft routers' or software-based routing devices. |
| CohereLabs/tiny-aya-water | SoftRouter | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Soft Router |
轻微
[准确性]
"Soft Router"
理由: Singular vs plural: reference is 'Soft Routers' (plural), hypothesis is singular | Source and reference both suggest the plural/set label "Soft Routers"; singular loses the idea of a category of items. | Number mismatch. Should be 'Soft Routers' (plural) to match reference. |
| Qwen/Qwen2.5-7B-Instruct | Soft Router |
轻微
[准确性]
"Soft Router"
理由: Singular vs plural mismatch: reference uses plural 'Routers' | Incorrect number; should be plural 'Soft Routers' | Source "软路由" typically refers to a class of devices or setups; singular vs. plural mismatch with reference "Soft Routers". |
| Qwen/Qwen3-14B | Soft Router |
轻微
[准确性]
"Soft Router"
理由: Source is plural-agnostic Chinese; reference uses plural ‘Soft Routers’ suggesting a category. Singular form may be less appropriate in UI/category context. | Missing plural form; should be 'Soft Routers' not 'Soft Router' | Singular vs plural: hypothesis uses singular 'Router' while reference uses plural 'Routers' |
| Qwen/Qwen3.5-4B | Soft Router |
轻微
[准确性]
"Soft Router"
理由: Singular vs. plural mismatch with reference "Soft Routers"; could imply only one router instead of multiple. | Singular form used; should be plural 'Soft Routers' to match reference | Singular form used while reference uses plural 'Soft Routers' |
| Qwen/Qwen3.5-9B | Soft Router |
轻微
[准确性]
"Soft Router"
理由: Missing plural form 's'. | Singular form does not match plural reference 'Soft Routers' | Singular form used instead of plural 'Soft Routers' as in reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | SoftRouter | 通过 (无共识错误) |
| google/gemma-3-12b-it | Soft Router |
轻微
[准确性]
"Soft Router"
理由: Singular ‘Router’ vs. plural ‘Routers’ in reference; potential mismatch with intended category plurality. | Missing plural form; should be 'Soft Routers' | Singular vs plural: 'Soft Router' vs 'Soft Routers' |
| google/gemma-3-1b-it | Soft Router |
轻微
[准确性]
"Soft Router"
理由: Source is a generic plural concept; reference uses plural “Soft Routers” while hypothesis is singular. | Singular vs plural mismatch: hypothesis uses singular 'Router' while reference uses plural 'Routers' | Singular form used; reference uses plural 'Soft Routers' |
| google/gemma-3-4b-it | Soft Routing |
轻微
[准确性]
"Soft Routing"
理由: Singular form vs reference plural 'Soft Routers' | The source and reference refer to 'routers' (devices) in plural, while 'routing' refers to the process; this is a slight semantic shift. | Hypothesis uses singular 'Routing' while reference uses plural 'Routers' for '软路由' | The translation uses 'Routing' instead of 'Routers'. |
| google/translategemma-12b-it | Software Router | 通过 (无共识错误) |
| google/translategemma-4b-it | Soft Routing |
轻微
[术语]
"Soft Routing"
理由: The translation uses 'Routing' instead of 'Routers', which is a minor terminology issue. | While understandable, 'Soft Routing' is less standard than 'Soft Routers' for the concept of '软路由' in this context, deviating from the reference. | Terminology not fully aligned; standard term in this context is closer to 'Soft Router(s)', but meaning is still understandable. |
| tencent/HY-MT1.5-1.8B | Soft Routing |
轻微
[准确性]
"Soft Routing"
理由: The reference uses the plural 'Soft Routers', while the hypothesis uses the singular/uncountable 'Soft Routing'. Depending on context, this might be a slight terminology mismatch, though 'Soft Routing' is often acceptable. Strictly following reference, it's a minor deviation. | Should be plural 'Soft Routers'. | Hypothesis 'Soft Routing' does not match reference 'Soft Routers'. Incorrect word form (singular vs plural). | Singular vs plural mismatch with reference "Soft Routers"; meaning is close but not fully aligned to the intended noun phrase. |
| tencent/HY-MT1.5-7B | Soft Routing | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Dating |
轻微
[准确性]
"Dating"
理由: Inaccurate translation; '恋爱' means 'Romance' not 'Dating' | '恋爱' emphasizes being in love/romance; 'Dating' shifts focus to the activity of going on dates rather than the state of romantic love. | Semantic difference - 'Dating' vs reference 'Romance' | The translation 'Dating' is too specific and does not fully capture the broader meaning of '恋爱' which refers to love or romance. |
| CohereLabs/tiny-aya-water | Dating |
严重
[准确性]
"Dating"
理由: '恋爱' refers to romantic love/being in a relationship; 'Dating' focuses on the activity of going on dates and changes the meaning. | The translation 'Dating' is less accurate than the reference 'Romance' as it does not fully capture the broader romantic context implied by '恋爱'. | Hypothesis translates '恋爱' as 'Dating' but reference uses 'Romance' which is more appropriate for the context | Nuance mismatch; 'Dating' implies the activity, while reference 'Romance' better captures the concept of '恋爱' (love/romantic relationship). | Inaccurate translation: 'Dating' does not match reference 'Romance' |
| Qwen/Qwen2.5-14B-Instruct | Romance | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Love |
严重
[准确性]
"Love"
理由: The translation 'Love' is less precise than the reference 'Romance' in capturing the intended nuance. | Incorrect translation: '恋爱' means Romance, not Love | Incorrect translation; should be 'Romance' not 'Love' | Nuance error; '恋爱' specifically refers to romantic relationships ('Romance'), whereas 'Love' is broader. | "恋爱" refers specifically to romantic relationship/romance, not generic "love"; semantic scope is incorrect. |
| Qwen/Qwen3-14B | Romance | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Love |
严重
[准确性]
"Love"
理由: Incorrect translation; '恋爱' means 'Romance' not 'Love' in this context | "恋爱" in this context refers specifically to romantic relationships/romance, not generic "love"; semantic scope is too broad. | Incorrect translation: 'Love' is not equivalent to 'Romance' in this context | The translation 'Love' is correct but less precise than 'Romance' in this context. |
| Qwen/Qwen3.5-9B | Love |
严重
[准确性]
"Love"
理由: Incorrect translation. '恋爱' means 'Romance' not 'Love' in this context. | Incorrect translation; 'Love' does not match reference 'Romance' | Nuance error: '恋爱' specifically refers to romantic relationships or romance, whereas 'Love' is broader. 'Romance' (reference) is more precise in this context. | "恋爱" in this context aligns better with "Romance" (romantic relationship/affair) rather than the broad concept of "Love"; semantic scope mismatch. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Love | 通过 (无共识错误) |
| google/gemma-3-12b-it | Romance | 通过 (无共识错误) |
| google/gemma-3-1b-it | 恋爱 |
轻微
[准确性]
"恋爱"
理由: Source text (Chinese) returned instead of target language (English) translation. | Literal translation retained instead of proper target language equivalent: reference provides 'Romance' as the target equivalent | Source text not translated; should be translated to 'Romance' in English | Source Chinese term was not translated into the target language (English) as in the reference (“Romance”). |
| google/gemma-3-4b-it | Romance | 通过 (无共识错误) |
| google/translategemma-12b-it | Romance | 通过 (无共识错误) |
| google/translategemma-4b-it | Search |
致命
[准确性]
"Search"
理由: The translation is incorrect; 'Search' does not correspond to '恋爱'. | Completely incorrect translation. '恋爱' should be 'Romance', not 'Search' | Completely incorrect; '恋爱' means 'romance/love relationship', not 'Search'. | Mistranslation: '恋爱' (Romance) translated to 'Search' | The source '恋爱' (Love/Romance) is translated as 'Search', which is completely unrelated and incorrect. |
| tencent/HY-MT1.5-1.8B | 恋爱 |
致命
[准确性]
"恋爱"
理由: Hypothesis is in Chinese but target language is zh and reference is in English 'Romance'. Should be translated to English. | The hypothesis repeats the Chinese source text instead of translating it to English ('Romance') as indicated by the reference and the pattern of other items. | Source retained instead of translated - 'Romance' is the correct translation | Missing translation for target language zh. |
| tencent/HY-MT1.5-7B | Love relationship | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Pronunciation |
严重
[准确性]
"Pronunciation"
理由: Incorrect translation; '双拼' refers to a specific input method (Double Pinyin), not general pronunciation. | '双拼' is a specific input method (Shuangpin/Double Pinyin), not generic 'Pronunciation'. | Incorrect translation; '双拼' refers to 'Shuangpin (Double Pinyin)' input method, not generic pronunciation | Completely wrong translation - source is 双拼 but hypothesis is unrelated 'Pronunciation' |
| CohereLabs/tiny-aya-water | Pronunciation |
轻微
[准确性]
"Pronunciation"
理由: Hypothesis 'Pronunciation' is incorrect. '双拼' refers to 'Shuangpin (Double Pinyin)' which is a specific input method, not just pronunciation | Incorrect translation: 'Pronunciation' does not match reference 'Shuangpin (Double Pinyin)' | '双拼' is a specific 'Shuangpin/Double Pinyin' input method, not generic pronunciation; meaning is changed completely. | Incorrect translation; '双拼' refers to a specific input method (Double Pinyin), not general 'Pronunciation'. |
| Qwen/Qwen2.5-14B-Instruct | Double Pinyin |
轻微
[术语]
"Double Pinyin"
理由: Missing the pinyin term. Should include 'Shuangpin (Double Pinyin)' for proper localization. | Term of art is usually kept as "Shuangpin" possibly with gloss; omitting the standard transliteration may reduce recognizability for users familiar with this input method name. | The translation does not include the term 'Shuangpin', which is part of the reference. | Missing transliteration: reference includes 'Shuangpin' alongside translation |
| Qwen/Qwen2.5-7B-Instruct | Double Pinyin | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Double Pinyin |
轻微
[准确性]
"Double Pinyin"
理由: Missing the Chinese pinyin term; should include 'Shuangpin (Double Pinyin)' | Incomplete translation; misses the specific transliteration 'Shuangpin' included in the reference for clarity. | Should include 'Shuangpin' as in reference |
| Qwen/Qwen3.5-4B | Pinyin with Zhuyin |
严重
[准确性]
"Pinyin with Zhuyin"
理由: Factually incorrect translation. '双拼' refers to 'Double Pinyin', not a combination with Zhuyin. | Incorrect translation: 'Pinyin with Zhuyin' is not the correct meaning of '双拼' which refers to Shuangpin (Double Pinyin) input method | Misrepresents "双拼" which is a specific input method (Double Pinyin), not Pinyin combined with Zhuyin. | Incorrect translation; '双拼' refers to 'Shuangpin (Double Pinyin)' not 'Pinyin with Zhuyin' |
| Qwen/Qwen3.5-9B | Double Pinyin | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Double cooking |
致命
[准确性]
"Double cooking"
理由: Completely incorrect translation; '双拼' is a Chinese input method (Shuangpin/Double Pinyin), not 'Double cooking' | Completely incorrect translation: '双拼' refers to 'Shuangpin (Double Pinyin)' input method, not 'Double cooking' | '双拼' in this context refers to Shuangpin/Double Pinyin input method, not cooking; semantic mistranslation. | Incorrect translation; should be 'Shuangpin (Double Pinyin)' | Severe mistranslation; '双拼' refers to a Pinyin input method ('Double Pinyin'), not a cooking technique. |
| google/gemma-3-12b-it | Double Pinyin | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '双拼' in en |
致命
[准确性]
"Translation of '双拼' in en"
理由: Should be translated to 'Shuangpin (Double Pinyin)' instead of providing a translation instruction | Placeholder text instead of actual translation; should be 'Shuangpin (Double Pinyin)' | Placeholder text; does not convey the intended meaning or term (“Shuangpin (Double Pinyin)”). | Model failed to translate, outputting a meta-commentary placeholder instead. | Placeholder text instead of actual translation; reference provides 'Shuangpin (Double Pinyin)' |
| google/gemma-3-4b-it | Close |
致命
[准确性]
"Close"
理由: Completely mistranslated; reference provides correct term 'Shuangpin (Double Pinyin)' | '双拼' refers to 'Shuangpin / Double Pinyin', an input method; 'Close' is completely unrelated and mistranslates the term. | Hypothesis 'Close' is completely incorrect translation of '双拼' (Shuangpin/Double Pinyin). This is a critical mistranslation. | The translation is incorrect; it should be 'Shuangpin (Double Pinyin)'. | Complete mistranslation; 'Close' does not match source '双拼' (Double Pinyin) or reference. |
| google/translategemma-12b-it | Double Pinyin |
轻微
[术语]
"Double Pinyin"
理由: Incomplete translation; should include pinyin term 'Shuangpin (Double Pinyin)' for clarity | The translation does not include 'Shuangpin', which is part of the reference. | Missing transliteration: reference includes 'Shuangpin' while hypothesis only has 'Double Pinyin' |
| google/translategemma-4b-it | Pinyin |
严重
[准确性]
"Pinyin"
理由: Overly general; omits 'double' aspect of '双拼', which specifically refers to Shuangpin (Double Pinyin). | The source '双拼' refers to a specific input method (Double Pinyin/Shuangpin). Translating it simply as 'Pinyin' loses the specific distinction of the input scheme. | Partial translation: '双拼' (Shuangpin/Double Pinyin) translated only to 'Pinyin', missing the 'double' meaning | Incomplete translation. '双拼' should be 'Shuangpin (Double Pinyin)', not just 'Pinyin' |
| tencent/HY-MT1.5-1.8B | Double Pinyin | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Double spelling |
严重
[术语]
"Double spelling"
理由: Inaccurate translation; reference provides correct term 'Shuangpin (Double Pinyin)' | The term 'Double spelling' is used instead of 'Shuangpin (Double Pinyin)'. | Incorrect translation of the input method '双拼'; the standard term is 'Double Pinyin' or 'Shuangpin'. | '双拼' refers to a specific Chinese input method, commonly 'Shuangpin' or 'Double Pinyin'; 'Double spelling' is misleading and fails to identify the correct technical term. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Matrix | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Matrix | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | matrix | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | matrix | 通过 (无共识错误) |
| Qwen/Qwen3-14B | matrix |
轻微
[风格]
"matrix"
理由: Does not follow reference capitalization ‘Matrix’; likely a proper noun or title requiring initial capital. | Incorrect capitalization; as a proper noun or title, it should be capitalized as 'Matrix'. | Incorrect capitalization; should be 'Matrix' with capital M |
| Qwen/Qwen3.5-4B | matrix |
轻微
[风格]
"matrix"
理由: Incorrect capitalization; should be 'Matrix' with capital M | Incorrect capitalization; should be 'Matrix'. | Capitalization differs from reference "Matrix"; stylistic inconsistency for a title/name. |
| Qwen/Qwen3.5-9B | Matrix | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Matrix | 通过 (无共识错误) |
| google/gemma-3-12b-it | matrix |
轻微
[风格]
"matrix"
理由: Capitalization difference: 'matrix' vs 'Matrix' | Not capitalized as in reference ‘Matrix’; formatting/capitalization issue. | Incorrect capitalization; should be 'Matrix' with capital M | Incorrect capitalization; proper nouns or titles should be capitalized as 'Matrix'. |
| google/gemma-3-1b-it | matrix | 通过 (无共识错误) |
| google/gemma-3-4b-it | matrix |
轻微
[风格]
"matrix"
理由: Capitalization does not match the expected 'Matrix' form; minor stylistic inconsistency for a name. | Capitalization: reference has 'Matrix' | Hypothesis uses lowercase 'matrix' while reference uses capitalized 'Matrix' | The translation uses lowercase instead of title case. | Incorrect capitalization; reference uses title case 'Matrix'. |
| google/translategemma-12b-it | matrix |
轻微
[风格]
"matrix"
理由: Capitalization difference: 'matrix' vs 'Matrix' | Incorrect capitalization; should be 'Matrix'. | The translation should be capitalized as 'Matrix' to match the reference. |
| google/translategemma-4b-it | matrix |
轻微
[风格]
"matrix"
理由: Capitalization differs from reference ('Matrix'); minor stylistic inconsistency, meaning unchanged. | Capitalization: 'matrix' should be 'Matrix' to match proper noun in reference | Capitalization error. 'matrix' should be capitalized as 'Matrix' | The hypothesis uses lowercase 'matrix' while the reference uses title case 'Matrix'. For proper nouns or titles, capitalization should match. |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Hypothesis '登录' (login) does not match source 'matrix' or reference 'Matrix'. Completely incorrect translation. | Mistranslation: "登录" means "log in" and does not correspond to the source term "matrix" or the reference "Matrix". | Completely wrong translation - 'login' instead of 'Matrix' | The hypothesis '登录' (Login) is completely unrelated to the source 'matrix'. Hallucination. | Incorrect translation. Should be 'Matrix'. |
| tencent/HY-MT1.5-7B | 矩阵 |
严重
[准确性]
"矩阵"
理由: Translation to Chinese when reference retains English 'Matrix' - possible over-translation | Wrong target language; the source is English and should remain English or be transliterated if a brand, but the reference indicates it should stay as 'Matrix'. | Hypothesis is in Chinese (矩阵) but target language is English; should be 'Matrix' per reference |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Width of home |
轻微
[准确性]
"Width of home"
理由: Literal misinterpretation; '家宽' is industry shorthand for residential broadband, not physical width of a home. | Literal mistranslation of the abbreviation '家宽'; it stands for 'Residential Broadband', not the physical width of a house. | Completely incorrect translation; '家宽' means 'Residential Broadband', not 'Width of home' | Incorrect translation - reference is 'Residential Broadband' |
| CohereLabs/tiny-aya-water | Home width |
严重
[准确性]
"Home width"
理由: '家宽' is shorthand for residential broadband; literal 'Home width' is semantically wrong and misleading. | Hypothesis 'Home width' is a literal mistranslation. '家宽' means 'Residential Broadband', not 'Home width' | Literal mistranslation of '家宽' (Home Broadband); 'width' is incorrect in this context. | Incorrect translation; should be 'Residential Broadband'. | Incorrect translation: 'Home width' does not match reference 'Residential Broadband' |
| Qwen/Qwen2.5-14B-Instruct | Home Broadband |
轻微
[术语]
"Home Broadband"
理由: The translation 'Home Broadband' is less commonly used than 'Residential Broadband' in the reference. | Incorrect translation. '家宽' should be 'Residential Broadband' not 'Home Broadband'. | Terminology preference: reference uses 'Residential Broadband' |
| Qwen/Qwen2.5-7B-Instruct | Home宽带 |
严重
[准确性]
"Home宽带"
理由: Language mixing error; the output contains Chinese characters ('宽带') despite the target language context implying full English translation ('Residential Broadband'). | Does not clearly convey "家宽" as residential broadband service; mixed-language phrase is ambiguous compared to "Residential Broadband". | Mixed language and incorrect translation; should be 'Residential Broadband' | Unnecessary mixing of English and Chinese; reference uses fully English term 'Residential Broadband' |
| Qwen/Qwen3-14B | Home Broadband |
轻微
[术语]
"Home Broadband"
理由: Terminology choice differs from the more formal/standard 'Residential Broadband' used in the reference. | Incorrect terminology; should be 'Residential Broadband' not 'Home Broadband' | Term choice: 'Home Broadband' vs reference 'Residential Broadband' - different terminology |
| Qwen/Qwen3.5-4B | Home Broadband | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Home Broadband |
轻微
[准确性]
"Home Broadband"
理由: Meaning is close, but the standard term in the reference is "Residential Broadband"; slight mismatch in register/precision. | Incorrect translation. Should be 'Residential Broadband' not 'Home Broadband'. | Does not match reference 'Residential Broadband' - slightly different meaning |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Home Broadband | 通过 (无共识错误) |
| google/gemma-3-12b-it | Home Broadband |
轻微
[术语]
"Home Broadband"
理由: Terminology choice differs from the reference 'Residential Broadband'; 'Residential' is often the standard technical term in this context. | Incorrect terminology; should be 'Residential Broadband' | Term choice: 'Home Broadband' vs 'Residential Broadband' - both valid but reference prefers 'Residential' | The translation uses 'Home' instead of the reference's 'Residential'. |
| google/gemma-3-1b-it | House width |
严重
[准确性]
"House width"
理由: Literal mistranslation; the intended meaning is network service type (“Residential Broadband”), not physical width of a house. | Incorrect translation; '家宽' means 'Residential Broadband' not 'House width' | Should be translated to 'Residential Broadband' instead of a literal translation | Severe mistranslation. '家宽' refers to 'Residential Broadband', not the physical width of a house. | Incorrect translation: reference provides 'Residential Broadband' as the correct term |
| google/gemma-3-4b-it | Home Broadband |
轻微
[准确性]
"Home Broadband"
理由: Hypothesis translates '家宽' as 'Home Broadband' while reference uses 'Residential Broadband', which is more accurate terminology | The translation uses 'Home' instead of 'Residential'. | 'Home Broadband' is close in meaning but slightly less precise than 'Residential Broadband'; minor nuance difference. |
| google/translategemma-12b-it | Home Broadband | 通过 (无共识错误) |
| google/translategemma-4b-it | Home Broadband |
轻微
[术语]
"Home Broadband"
理由: Different but acceptable rendering of '家宽'; generally synonymous with 'Residential Broadband'. | Terminology variation: '家宽' (Residential Broadband) translated as 'Home Broadband', acceptable but different from reference | While 'Home Broadband' is accurate, 'Residential Broadband' is the more formal industry term preferred in the reference. |
| tencent/HY-MT1.5-1.8B | 家宽 |
致命
[准确性]
"家宽"
理由: Source retained instead of translated - 'Residential Broadband' is the correct translation | Hypothesis is in Chinese but target language is zh and reference is in English 'Residential Broadband'. Should be translated to English. | Missing translation for target language zh. | The hypothesis repeats the Chinese source text instead of translating it to English ('Residential Broadband'). |
| tencent/HY-MT1.5-7B | Home width |
致命
[准确性]
"Home width"
理由: '家宽' refers to residential broadband service; 'Home width' is a literal misinterpretation and incorrect meaning. | Hypothesis 'Home width' is an incorrect literal translation; reference 'Residential Broadband' is the correct technical term for '家宽' | Nonsensical literal translation; '家宽' refers to 'Residential Broadband'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | 92 Topics |
轻微
[准确性]
"92 Topics"
理由: Plural form difference - hypothesis has 'Topics' but reference is 'Topic' (singular) | Pluralization differs from reference '92 Topic'; slight change in intended category label. | Incorrect pluralization; should be 'Topic' (singular) to match reference |
| CohereLabs/tiny-aya-water | 92 Topics |
轻微
[准确性]
"Topics"
理由: Plural form vs reference singular '92 Topic' | Hypothesis uses plural 'Topics' while reference uses singular 'Topic' which is the correct form | Pluralization differs from reference 'Topic'; likely refers to a named forum/section where singular is expected. | Number disagreement; source '话题' can be singular or plural, but reference specifies singular 'Topic'. |
| Qwen/Qwen2.5-14B-Instruct | 92 Topics |
轻微
[准确性]
"92 Topics"
理由: Singular vs plural: reference is '92 Topic' (singular), hypothesis is plural | Number mismatch. Should be singular 'Topic' not plural 'Topics'. | Source can mean a single topic or general concept; reference chooses singular as label. Pluralization changes the intended granularity slightly. |
| Qwen/Qwen2.5-7B-Instruct | 92 Topics |
轻微
[准确性]
"92 Topics"
理由: Source "92 话题" is singular; "Topics" introduces a number mismatch with the singular reference "Topic". | Incorrect number; should be singular 'Topic' not plural 'Topics' | Plural vs singular: reference uses singular 'Topic' |
| Qwen/Qwen3-14B | 92 Topics |
轻微
[风格]
"92 Topics"
理由: Number disagreement; the reference uses the singular 'Topic', suggesting a category label rather than a count. | Plural vs singular ‘Topic’ is a stylistic/UI-label choice; meaning is largely preserved but differs from reference wording. | Should be singular 'Topic' |
| Qwen/Qwen3.5-4B | 92 Topics |
轻微
[准确性]
"92 Topics"
理由: Pluralization differs from reference "92 Topic"; number agreement may depend on category naming convention. | Plural form used while reference uses singular '92 Topic' | Incorrect plural form; should be singular 'Topic' to match reference |
| Qwen/Qwen3.5-9B | Topic 92 | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Topic 92 |
轻微
[风格]
"Topic 92"
理由: Unconventional order; '92 Topic' is more appropriate | Word order differs from reference: hypothesis uses 'Topic 92' while reference uses '92 Topic' | Incorrect word order; should be '92 Topic' not 'Topic 92' | Word order deviation from the source structure; '92 Topic' preserves the numeric prefix emphasis better. |
| google/gemma-3-12b-it | 92 Topics |
轻微
[准确性]
"92 Topics"
理由: Plural ‘Topics’ vs. singular ‘Topic’ in reference; slight mismatch with intended category label. | Singular vs plural: '92 Topics' vs '92 Topic' - reference uses singular | Incorrect plural form; should be 'Topic' (singular) |
| google/gemma-3-1b-it | Translation of '92 话题' in en |
致命
[准确性]
"Translation of '92 话题' in en"
理由: Placeholder text instead of actual translation; reference provides '92 Topic' | Model failed to translate, outputting a meta-commentary placeholder instead. | Should be translated to '92 Topic' instead of providing a translation instruction | Placeholder text without actual translation of the term (“92 Topic”). | Placeholder text instead of actual translation; should be '92 Topic' |
| google/gemma-3-4b-it | 92 Topics |
轻微
[准确性]
"92 Topics"
理由: Plural 'Topics' vs reference 'Topic'; small number mismatch, but meaning is largely preserved. | Plural form vs reference singular '92 Topic' | Number disagreement; source implies singular or collective, reference uses singular 'Topic'. | Hypothesis uses plural 'Topics' while reference uses singular 'Topic' |
| google/translategemma-12b-it | 92 topics |
轻微
[风格]
"topics"
理由: The translation uses 'topics' instead of 'Topic' as in the reference. | Plural 'topics' used instead of singular 'Topic' as per reference convention. | Capitalization difference: 'topics' vs 'Topic' |
| google/translategemma-4b-it | Reload |
致命
[准确性]
"Reload"
理由: The source '92 话题' (92 Topics) is translated as 'Reload', which is completely unrelated to the source content. | Mistranslation; bears no relation to '92 话题' ('92 Topic'). | The translation is incorrect; 'Reload' does not correspond to '92 话题'. | Mistranslation: '92 话题' (92 Topic) translated to 'Reload' | Completely incorrect translation. '92 话题' should be '92 Topic', not 'Reload' |
| tencent/HY-MT1.5-1.8B | Translation of '92 话题' in en |
致命
[准确性]
"Translation of '92 话题' in en"
理由: Incorrect output format - contains English meta text instead of actual translation | Hypothesis is a meta-description rather than actual translation. Should be '92 Topic' as per reference. | Missing translation for target language zh. | The hypothesis is meta-text describing a translation rather than providing the translation "92 Topic"; it does not convey the source meaning. | The hypothesis is a meta-comment describing the translation task rather than providing the actual translation ('92 Topic'). |
| tencent/HY-MT1.5-7B | 92 Topics |
轻微
[准确性]
"Topics"
理由: Hypothesis uses plural 'Topics' while reference uses singular 'Topic' which matches the source | Pluralization differs from the singular 'Topic' in the reference, suggesting a slightly different scope. | Plural form used while reference uses singular '92 Topic' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Robot |
轻微
[准确性]
"Robot"
理由: Singular vs plural - reference is 'Robots / Bots' | Incomplete translation; should be 'Robots / Bots' (plural form with alternative) | Source '机器人' is generic and can be singular or plural; reference suggests 'Robots / Bots'. While 'Robot' is not strictly wrong, it lacks the plural nuance often implied in UI contexts compared to the reference. |
| CohereLabs/tiny-aya-water | Robot |
轻微
[准确性]
"Robot"
理由: Singular vs plural; reference suggests both 'Robots / Bots' as a category label. | Singular form vs reference plural 'Robots / Bots' | Hypothesis uses singular 'Robot' while reference uses plural 'Robots / Bots' which is the correct form | The source text '机器人' is translated into English ('Robot') instead of the target language Chinese (e.g., '机器人'). The reference confirms the target should be Chinese. |
| Qwen/Qwen2.5-14B-Instruct | Robot |
轻微
[准确性]
"Robot"
理由: Reference suggests a category label including plural or bots; singular "Robot" under-represents the broader set of robots/bots. | Singular vs plural: reference is 'Robots / Bots' (plural), hypothesis is singular | Incomplete translation. Should be 'Robots / Bots' to match reference options. |
| Qwen/Qwen2.5-7B-Instruct | robot | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Robot |
轻微
[准确性]
"Robot"
理由: Singular vs plural: hypothesis uses singular 'Robot' while reference uses plural 'Robots / Bots' | Incomplete translation; should include alternatives 'Robots / Bots' or at least plural form | Singular ‘Robot’ does not match the plural sense ‘Robots / Bots’ indicated by the reference, which suggests a category or multiple items. | Incomplete coverage; the source term often encompasses both 'Robots' and 'Bots' in this context, as reflected in the reference. |
| Qwen/Qwen3.5-4B | Robot |
轻微
[准确性]
"Robot"
理由: Singular form used while reference uses plural 'Robots / Bots' | Singular vs. plural mismatch; source and reference suggest a category covering multiple "Robots / Bots" rather than a single robot. | Singular form used; should be plural 'Robots' or include alternative 'Bots' to match reference |
| Qwen/Qwen3.5-9B | Robot |
轻微
[准确性]
"Robot"
理由: Singular form does not match plural reference 'Robots / Bots' | Singular form used instead of plural 'Robots' as in reference. | Source and reference suggest plural/collective "Robots / Bots" while hypothesis is singular; overall meaning is close but number and inclusion of "bots" nuance are lost. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Robot |
轻微
[风格]
"Robot"
理由: Singular form differs from reference plural 'Robots / Bots' | Singular form used where the reference suggests a plural or broader category ('Robots / Bots'). | Singular form is less common; 'Robots' is preferred |
| google/gemma-3-12b-it | Robot |
轻微
[准确性]
"Robot"
理由: Singular vs plural: 'Robot' vs 'Robots / Bots' - reference uses plural | Incomplete translation; the source '机器人' in this context likely encompasses 'Bots' as indicated by the reference 'Robots / Bots'. | Incomplete translation; should be 'Robots / Bots' or similar | Singular ‘Robot’ covers only one sense of ‘机器人’, while reference includes plural ‘Robots / Bots’; narrower than intended. |
| google/gemma-3-1b-it | Robot |
轻微
[准确性]
"Robot"
理由: Singular vs plural mismatch: hypothesis uses singular 'Robot' while reference uses plural 'Robots / Bots' | Number mismatch; reference covers plural (“Robots / Bots”) while hypothesis is singular. | Singular form used; reference uses plural 'Robots / Bots' |
| google/gemma-3-4b-it | Confirm |
致命
[准确性]
"Confirm"
理由: Complete mistranslation; 'Confirm' is unrelated to source '机器人' (Robots). | '机器人' means 'robots/bots'; 'Confirm' is completely unrelated and changes the function/meaning. | Hypothesis 'Confirm' is completely incorrect translation of '机器人' (Robots/Bots). This is a critical mistranslation. | The translation is incorrect; it should be 'Robots / Bots'. | Completely incorrect translation; reference is 'Robots / Bots' |
| google/translategemma-12b-it | Robot |
轻微
[准确性]
"Robot"
理由: Singular 'Robot' may not fully capture the scope of 'Robots / Bots' implied by the source. | Incorrect number; should be plural 'Robots' or include alternative 'Bots' | Singular vs plural: 'Robot' vs 'Robots / Bots' |
| google/translategemma-4b-it | More |
致命
[准确性]
"More"
理由: Mistranslation: '机器人' (Robots/Bots) translated to 'More' | The source '机器人' (Robots/Bots) is translated as 'More', which is semantically incorrect. | The translation is incorrect; 'More' does not correspond to '机器人'. | Completely incorrect translation. '机器人' should be 'Robots / Bots', not 'More' | Mistranslation; '机器人' means 'Robots/Bots', not 'More'. |
| tencent/HY-MT1.5-1.8B | Robot's Translation in English |
致命
[准确性]
"Robot's Translation in English"
理由: Hypothesis is a meta-description rather than actual translation. Should be 'Robots / Bots' as per reference. | The hypothesis is a meta-comment describing the translation task rather than providing the actual translation ('Robots / Bots'). | The hypothesis is meta-text about translation instead of translating "机器人" to its English meaning (e.g., "Robots / Bots"). | Missing translation for target language zh. |
| tencent/HY-MT1.5-7B | Assistant |
严重
[准确性]
"Assistant"
理由: Hypothesis translates '机器人' as 'Assistant' but reference uses 'Robots / Bots' which is the correct translation | Semantic shift; '机器人' means 'Robot' or 'Bot', whereas 'Assistant' implies a specific function not present in the source. | '机器人' denotes 'robots' or 'bots'; translating as 'Assistant' changes the entity type and meaning. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Category |
轻微
[准确性]
"Category"
理由: Singular vs plural - reference is 'Categories' | Singular vs plural ('Categories'); minor mismatch in UI/category label. | Source '类别' is generic. Reference uses plural 'Categories'. In UI contexts, this usually implies a list of categories, making the singular 'Category' slightly less accurate than the plural form. | Incorrect number; should be plural 'Categories' to match reference |
| CohereLabs/tiny-aya-water | Category |
轻微
[准确性]
"Category"
理由: Singular vs plural; reference UI label is 'Categories' for a list of types. | Singular form vs reference plural 'Categories' | The source text '类别' is translated into English ('Category') instead of the target language Chinese (e.g., '类别'). The reference confirms the target should be Chinese. | Hypothesis uses singular 'Category' while reference uses plural 'Categories' which is the correct form |
| Qwen/Qwen2.5-14B-Instruct | Category |
轻微
[准确性]
"Category"
理由: Number mismatch. Should be plural 'Categories' not singular 'Category'. | Source can be generic; in UI context reference uses plural as a menu/category label. Singular slightly misaligns with intended usage. | Singular vs plural: reference is 'Categories' (plural), hypothesis is singular |
| Qwen/Qwen2.5-7B-Instruct | Category |
轻微
[准确性]
"Category"
理由: Singular "Category" may not match intended plural sense of "类别" in categorization contexts; reference uses plural "Categories". | Incorrect number; should be plural 'Categories' | Singular vs plural: reference uses plural 'Categories' |
| Qwen/Qwen3-14B | Category |
轻微
[准确性]
"Category"
理由: Singular form does not reflect the plural ‘Categories’ used in the reference, which better matches a UI section listing multiple categories. | Singular vs plural: hypothesis uses singular 'Category' while reference uses plural 'Categories' | Incorrect plural form; should be 'Categories' not 'Category' |
| Qwen/Qwen3.5-4B | Category |
轻微
[准确性]
"Category"
理由: Singular vs. plural mismatch with "Categories"; likely a label for multiple categories. | Singular form used while reference uses plural 'Categories' | Singular form used; should be plural 'Categories' to match reference |
| Qwen/Qwen3.5-9B | Category |
轻微
[准确性]
"Category"
理由: Singular form does not match plural reference 'Categories' | Singular vs plural mismatch compared to reference "Categories"; meaning is essentially the same but number differs. | Singular form used instead of plural 'Categories' as in reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Category |
轻微
[风格]
"Category"
理由: Singular form used where the reference indicates a plural category listing ('Categories'). | Singular form is less common; 'Categories' is preferred | Singular form differs from reference plural 'Categories' |
| google/gemma-3-12b-it | Category |
轻微
[准确性]
"Category"
理由: Singular vs plural: 'Category' vs 'Categories' - reference uses plural | Missing plural form; should be 'Categories' | Singular ‘Category’ vs. plural ‘Categories’ in reference; may not fully match the intended UI label. |
| google/gemma-3-1b-it | Category |
轻微
[准确性]
"Category"
理由: Number mismatch; reference uses plural (“Categories”) while hypothesis is singular. | Singular vs plural mismatch: hypothesis uses singular 'Category' while reference uses plural 'Categories' | Singular form used; reference uses plural 'Categories' |
| google/gemma-3-4b-it | Category |
轻微
[准确性]
"Category"
理由: Singular vs plural ('Categories'); small mismatch in number. | Number disagreement; reference uses plural 'Categories'. | Hypothesis uses singular 'Category' while reference uses plural 'Categories' | Singular vs reference plural 'Categories' |
| google/translategemma-12b-it | Category | 通过 (无共识错误) |
| google/translategemma-4b-it | Category |
轻微
[准确性]
"Category"
理由: Singular/plural mismatch: '类别' (Categories) translated as singular 'Category' | Singular vs plural mismatch ('类别' here corresponds to 'Categories'); slight meaning deviation but close. | Number mismatch. '类别' should be 'Categories' (plural), not 'Category' (singular) |
| tencent/HY-MT1.5-1.8B | Category Translation in English |
致命
[准确性]
"Category Translation in English"
理由: The hypothesis is a meta-comment describing the translation task rather than providing the actual translation ('Categories'). | Hypothesis is a meta-description rather than actual translation. Should be 'Categories' as per reference. | Incorrect output format - contains English meta text instead of actual translation | Missing translation for target language zh. | The hypothesis is meta-text describing a translation instead of providing the translated term corresponding to "类别" / "Categories". |
| tencent/HY-MT1.5-7B | Category |
轻微
[准确性]
"Category"
理由: Singular instead of plural 'Categories' as in the reference; slight mismatch in number. | Singular form used while reference uses plural 'Categories' | Hypothesis uses singular 'Category' while reference uses plural 'Categories' which better matches the source context |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | We have detected that you may be interested in our products or services. |
致命
[准确性]
"We have detected that you may be interested in our products or services."
理由: Severe hallucination. The source '推广' means 'Promotion' or 'Promote'. The hypothesis generates a full sentence unrelated to the concise label required, completely failing to convey the source meaning. | Incorrect translation of '推广' | Completely incorrect translation; source is '推广' (Promotions), not a detection message | Completely wrong translation - source is 推广 but hypothesis is unrelated text | Paraphrases a marketing sentence instead of the noun '推广' ('Promotions'); meaning and structure are completely changed. |
| CohereLabs/tiny-aya-water | Promotion |
轻微
[准确性]
"Promotion"
理由: Singular vs plural; in this context 'Promotions' as a section/category is more accurate. | Hypothesis uses singular 'Promotion' while reference uses plural 'Promotions' which is the correct form | The source text '推广' is translated into English ('Promotion') instead of the target language Chinese (e.g., '推广'). The reference confirms the target should be Chinese. | Singular form vs reference plural 'Promotions' |
| Qwen/Qwen2.5-14B-Instruct | Promotion |
轻微
[准确性]
"Promotion"
理由: Reference suggests a category of multiple promotions; singular can be interpreted as one specific promotion rather than a set. | Number mismatch. Should be plural 'Promotions' not singular 'Promotion'. | Singular vs plural: reference is 'Promotions' (plural), hypothesis is singular |
| Qwen/Qwen2.5-7B-Instruct | Promote |
轻微
[准确性]
"Promote"
理由: Word form: reference uses noun 'Promotions' rather than verb 'Promote' | Part of speech mismatch; source '推广' in this context likely refers to the noun concept 'Promotions' rather than the verb 'Promote'. | Incorrect form; should be 'Promotions' not 'Promote' | Verb form "Promote" does not match nominal sense of "推广" as in "Promotions" or "promotion activities"; part-of-speech mismatch. |
| Qwen/Qwen3-14B | Promote |
轻微
[准确性]
"Promote"
理由: Part of speech error; the source implies a noun (Promotions/Marketing), but the hypothesis translates it as a verb (Promote). | Word form: hypothesis uses verb 'Promote' while reference uses noun 'Promotions' | Verb form ‘Promote’ vs noun ‘Promotions’ in reference; may not match intended function label (section for promotions rather than an action). | Incorrect form; should be 'Promotions' (noun) not 'Promote' (verb) |
| Qwen/Qwen3.5-4B | Promote |
严重
[准确性]
"Promote"
理由: Incorrect word form; should be 'Promotions' (noun) not 'Promote' (verb) | Verb form does not match nominal sense of "推广" as used in categories; should be a noun like "Promotion/Promotions". | Verb form used while reference uses noun form 'Promotions' |
| Qwen/Qwen3.5-9B | Promote |
轻微
[准确性]
"Promote"
理由: "推广" as a noun here corresponds to "Promotions" in reference; hypothesis uses a verb form, changing part of speech and usage. | Verb form does not match plural noun reference 'Promotions' | Singular form used instead of plural 'Promotions' as in reference. | Part of speech error: Source '推广' in this context likely refers to the noun 'Promotions' (as per reference), but hypothesis translates it as the verb 'Promote'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Promotion |
轻微
[风格]
"Promotion"
理由: Singular form differs from reference plural 'Promotions' | Singular form is less common; 'Promotions' is preferred | Singular form used where the reference indicates a plural category ('Promotions'). |
| google/gemma-3-12b-it | Promotion |
轻微
[准确性]
"Promotion"
理由: Missing plural form; should be 'Promotions' | Singular ‘Promotion’ vs. plural ‘Promotions’ in reference; slight mismatch with category concept. | Singular vs plural: 'Promotion' vs 'Promotions' - reference uses plural |
| google/gemma-3-1b-it | Promotion |
轻微
[准确性]
"Promotion"
理由: Singular form used; reference uses plural 'Promotions' | Number mismatch; reference uses plural (“Promotions”) while hypothesis is singular. | Singular vs plural mismatch: hypothesis uses singular 'Promotion' while reference uses plural 'Promotions' |
| google/gemma-3-4b-it | Confirm |
致命
[准确性]
"Confirm"
理由: Complete mistranslation; 'Confirm' is unrelated to source '推广' (Promotions). | The translation is incorrect; it should be 'Promotions'. | Completely incorrect translation; reference is 'Promotions' | Hypothesis 'Confirm' is completely incorrect translation of '推广' (Promotions). This is a critical mistranslation. | '推广' refers to 'promotion/promotions'; 'Confirm' is an unrelated UI action and a clear mistranslation. |
| google/translategemma-12b-it | Promotion | 通过 (无共识错误) |
| google/translategemma-4b-it | Promotion |
轻微
[准确性]
"Promotion"
理由: Singular vs plural mismatch vs reference 'Promotions'; meaning mostly preserved. | Singular/plural mismatch: '推广' (Promotions) translated as singular 'Promotion' | Number mismatch. '推广' should be 'Promotions' (plural), not 'Promotion' (singular) |
| tencent/HY-MT1.5-1.8B | 推广的翻译是 |
致命
[准确性]
"推广的翻译是"
理由: Hypothesis is incomplete meta-description meaning 'translation of 推广 is'. Should be 'Promotions' as per reference. | Incomplete translation - ends prematurely, missing actual translation | Missing translation for target language zh. | Incomplete and meta: states "The translation of 推广 is" without giving the actual translation "Promotions" or equivalent content. | The hypothesis is an incomplete sentence meaning 'The translation of 推广 is', failing to provide the actual translation ('Promotions'). |
| tencent/HY-MT1.5-7B | Promotion |
轻微
[准确性]
"Promotion"
理由: Singular form used while reference uses plural 'Promotions' | Hypothesis uses singular 'Promotion' while reference uses plural 'Promotions' which better matches the source | Singular instead of plural 'Promotions' in the reference; scope mismatch but meaning is close. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Drawio | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Drawio |
严重
[准确性]
"Drawio"
理由: Missing period in brand name: 'Drawio' vs reference 'Draw.io' | The source text 'drawio' is a proper noun/brand name that should remain unchanged or be transliterated into Chinese, but the hypothesis outputs English. The reference indicates the expected output format differs (likely retaining the brand or translating contextually), but primarily the language is wrong. | Hypothesis 'Drawio' is missing the dot. Reference correctly shows 'Draw.io' which is the proper brand name format |
| Qwen/Qwen2.5-14B-Instruct | drawio | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | drawio | 通过 (无共识错误) |
| Qwen/Qwen3-14B | draw.io |
轻微
[风格]
"draw.io"
理由: Incorrect capitalization for the brand name; should be 'Draw.io'. | Incorrect capitalization; should be 'Draw.io' with capital D | Differs only in capitalization from reference ‘Draw.io’; minor stylistic inconsistency with product naming. |
| Qwen/Qwen3.5-4B | draw.io |
轻微
[风格]
"draw.io"
理由: Incorrect capitalization; brand name should be 'Draw.io'. | Incorrect capitalization; should be 'Draw.io' with capital D | Capitalization differs from reference "Draw.io"; minor stylistic inconsistency for product name. |
| Qwen/Qwen3.5-9B | draw.io |
轻微
[风格]
"draw.io"
理由: Capitalization: The proper noun/brand name should be capitalized as 'Draw.io' per the reference. | Case difference: 'draw.io' vs reference 'Draw.io' | Capitalization differs. Should be 'Draw.io' with capital D as in reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | draw.io |
轻微
[风格]
"draw.io"
理由: Incorrect capitalization for the brand name; should be 'Draw.io'. | Incorrect capitalization; should be 'Draw.io' with capital D | Capitalization mismatch: reference uses 'Draw.io' with capital D |
| google/gemma-3-12b-it | drawio |
轻微
[风格]
"drawio"
理由: Incorrect formatting; should be 'Draw.io' with capital D and period | Brand name capitalization/formatting differs from reference ‘Draw.io’. | Capitalization difference: 'drawio' vs 'Draw.io' | Incorrect branding format; the official name is 'Draw.io'. |
| google/gemma-3-1b-it | drawio | 通过 (无共识错误) |
| google/gemma-3-4b-it | draw.io |
轻微
[风格]
"draw.io"
理由: The translation uses lowercase instead of title case. | Incorrect capitalization; reference uses 'Draw.io'. | Hypothesis uses lowercase 'draw.io' while reference uses capitalized 'Draw.io' |
| google/translategemma-12b-it | drawio | 通过 (无共识错误) |
| google/translategemma-4b-it | draw.io |
轻微
[风格]
"draw.io"
理由: The hypothesis uses lowercase 'd' while the reference uses title case 'Draw.io'. Brand names should maintain consistent capitalization. | Capitalization error. 'draw.io' should be capitalized as 'Draw.io' | Only differs by capitalization from 'Draw.io'; branding/case style issue, not meaning. | Capitalization: 'drawio' translated as 'draw.io' but reference is 'Draw.io' |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: The hypothesis '登录' (Login) is completely unrelated to the source 'drawio'. Hallucination. | Incorrect translation. Should be 'Draw.io'. | Completely wrong translation - 'login' instead of 'Draw.io' | Hypothesis '登录' (login) does not match source 'drawio' or reference 'Draw.io'. Completely incorrect translation. | Mistranslation: "登录" (log in) is unrelated to the source term "drawio" and the reference "Draw.io". |
| tencent/HY-MT1.5-7B | DrawIO | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Cart |
严重
[准确性]
"Cart"
理由: '购物' is 'Shopping' (activity); 'Cart' refers to a specific feature, not the general activity. | Incorrect translation; '购物' means 'Shopping', not 'Cart' | Mistranslation. Source '购物' means 'Shopping' (the activity). 'Cart' refers to '购物车'. These are distinct concepts in e-commerce UI. | Semantic difference - 'Cart' vs reference 'Shopping' |
| CohereLabs/tiny-aya-water | Shopping | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Shopping | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Shopping | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Shopping | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Shopping | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Shopping | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Shopping | 通过 (无共识错误) |
| google/gemma-3-12b-it | Shopping | 通过 (无共识错误) |
| google/gemma-3-1b-it | Shopping | 通过 (无共识错误) |
| google/gemma-3-4b-it | Confirm |
致命
[准确性]
"Confirm"
理由: The translation is incorrect; it should be 'Shopping'. | Hypothesis 'Confirm' is completely incorrect translation of '购物' (Shopping). This is a critical mistranslation. | Complete mistranslation; 'Confirm' is unrelated to source '购物' (Shopping). | '购物' means 'shopping'; 'Confirm' is unrelated and changes the intended action. | Completely incorrect translation; reference is 'Shopping' |
| google/translategemma-12b-it | Shopping | 通过 (无共识错误) |
| google/translategemma-4b-it | Shopping | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Completely wrong translation - 'login' instead of 'Shopping' | Hypothesis '登录' (login) does not match source '购物' or reference 'Shopping'. Completely incorrect translation. | Incorrect translation. Should be 'Shopping'. | The hypothesis '登录' (Login) is completely unrelated to the source '购物' (Shopping). Hallucination. | Mistranslation: "登录" (log in) does not convey the meaning of "购物" (shopping) or match the reference "Shopping". |
| tencent/HY-MT1.5-7B | Shopping | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Vote |
轻微
[准确性]
"Vote"
理由: Nuance mismatch. Source '投票' can mean the action 'Vote' or the event 'Poll'. Given the reference 'Polls', the context likely implies a list of polls rather than the action button. 'Vote' is acceptable but less precise for a section header. | Singular vs plural - reference is 'Polls' | '投票' as a section label can mean 'polls'; 'Vote' is close but not identical to reference plural 'Polls'. | Incorrect number; should be plural 'Polls' to match reference |
| CohereLabs/tiny-aya-water | Vote |
严重
[准确性]
"Vote"
理由: The source text '投票' is translated into English ('Vote') instead of the target language Chinese (e.g., '投票'). Additionally, 'Vote' may not fully capture the nuance of 'Polls' (参考) depending on context, but the primary error is the wrong target language. | Does not match the reference 'Polls'. | '投票' in UI normally refers to polls/surveys; 'Vote' (verb/single act) does not match the feature name 'Polls'. | Incorrect translation: 'Vote' does not match reference 'Polls' | Hypothesis uses singular 'Vote' while reference uses plural 'Polls' which is the correct form and more appropriate term |
| Qwen/Qwen2.5-14B-Instruct | Vote |
严重
[术语]
"Vote"
理由: Terminology preference: reference uses 'Polls' not 'Vote' | Incorrect translation. '投票' should be 'Polls' not 'Vote'. | The translation 'Vote' does not match the reference 'Polls', which is the correct term for this context. |
| Qwen/Qwen2.5-7B-Instruct | Vote |
轻微
[准确性]
"Vote"
理由: Semantic nuance; '投票' as a feature label usually refers to 'Polls' (the event/system) rather than the action 'Vote'. | Incorrect translation; should be 'Polls' not 'Vote' | Incorrect translation: '投票' means Polls, not Vote | "投票" in this context matches "Polls" as a feature/section name; "Vote" as a verb/instance is less accurate. |
| Qwen/Qwen3-14B | Vote |
轻微
[准确性]
"Vote"
理由: Should be 'Polls' | Incorrect translation; should be 'Polls' not 'Vote' | Translates ‘投票’ as an action, while reference ‘Polls’ indicates a noun/category; may not match intended UI function. | Nuance error; 'Vote' refers to the action, whereas 'Polls' (reference) refers to the feature or object, which fits the source context better. |
| Qwen/Qwen3.5-4B | Vote |
严重
[准确性]
"Vote"
理由: Incorrect translation; '投票' should be 'Polls' not 'Vote' | Terminology difference: 'Vote' vs 'Polls' - different meanings in this context | Translates as a verb while the category label refers to polls/surveys; "Polls" better matches intended function. |
| Qwen/Qwen3.5-9B | Vote |
轻微
[准确性]
"Vote"
理由: Incorrect translation. '投票' means 'Polls' not 'Vote'. | Does not match reference 'Polls' - different meaning | Semantic nuance: '投票' in a UI context often refers to 'Polls' (the event/system), whereas 'Vote' refers to the action. Reference prefers 'Polls'. | "投票" in this context refers to polls as a feature/category; "Vote" is an action, whereas reference uses the noun "Polls"; part-of-speech and functional mismatch. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Vote |
严重
[准确性]
"Vote"
理由: Source '投票' as a category label is closer to 'Polls' than 'Vote'; 'Vote' sounds like a one-time action rather than a collection of polls. | Does not capture the context of polls | Inaccurate translation; '投票' should be translated as 'Polls' not 'Vote' |
| google/gemma-3-12b-it | Vote |
轻微
[术语]
"Vote"
理由: Incorrect word choice; should be 'Polls' not 'Vote' | Term choice: 'Vote' vs 'Polls' - different but related terms | The translation uses 'Vote' instead of the reference's 'Polls'. |
| google/gemma-3-1b-it | Voting | 通过 (无共识错误) |
| google/gemma-3-4b-it | Vote |
轻微
[准确性]
"Vote"
理由: The translation uses 'Vote' instead of 'Polls'. | Hypothesis uses singular 'Vote' while reference uses plural 'Polls' for '投票' | Semantic nuance error; source '投票' in this context aligns better with reference 'Polls' rather than the action 'Vote'. | Different word choice; reference uses 'Polls' | '投票' as a feature label more closely matches 'Polls'; 'Vote' captures only part of the concept but is understandable. |
| google/translategemma-12b-it | Vote | 通过 (无共识错误) |
| google/translategemma-4b-it | Exit |
致命
[准确性]
"Exit"
理由: Completely incorrect; '投票' means 'vote/poll', not 'Exit'. | Mistranslation: '投票' (Polls) translated to 'Exit' | The source '投票' (Vote/Polls) is translated as 'Exit', which is completely unrelated and incorrect. | Completely incorrect translation. '投票' should be 'Polls', not 'Exit' | The translation is incorrect; 'Exit' does not correspond to '投票'. |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Hypothesis '登录' (login) does not match source '投票' or reference 'Polls'. Completely incorrect translation. | Mistranslation: "登录" (log in) is unrelated to "投票" (voting) and does not match the reference "Polls". | Completely wrong translation - 'login' instead of 'Polls' | The hypothesis '登录' (Login) is completely unrelated to the source '投票' (Polls). Hallucination. | Incorrect translation. Should be 'Polls'. |
| tencent/HY-MT1.5-7B | Vote |
轻微
[准确性]
"Vote"
理由: Source and reference suggest a feature/category 'Polls'; 'Vote' refers more to the action than the poll objects. | Nuance error; 'Polls' better captures the meaning of '投票' in a community/forum context than the action 'Vote'. | Hypothesis uses singular 'Vote' while reference uses plural 'Polls' which is the more appropriate translation of '投票' | Inconsistent terminology: 'Vote' vs reference 'Polls' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Anthropic | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Anthropic | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Anthropic | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Anthropic | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Anthropic | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Anthropic | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Anthropic | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Anthropic | 通过 (无共识错误) |
| google/gemma-3-12b-it | Anthropic | 通过 (无共识错误) |
| google/gemma-3-1b-it | Anthropic | 通过 (无共识错误) |
| google/gemma-3-4b-it | Save |
致命
[准确性]
"Save"
理由: 'Anthropic' is a proper name; 'Save' is a completely different meaning and function. | Complete mistranslation; 'Save' is unrelated to the proper noun 'Anthropic'. | The translation is incorrect; it should be 'Anthropic'. | Completely incorrect translation; reference is 'Anthropic' (proper noun should be preserved) | Hypothesis 'Save' is completely incorrect translation of 'Anthropic' (a proper noun/company name). This is a critical mistranslation. |
| google/translategemma-12b-it | Anthropic | 通过 (无共识错误) |
| google/translategemma-4b-it | Anthropic | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Anthropic | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Anthropic | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Programming language |
轻微
[准确性]
"Programming language"
理由: Number mismatch. Source '编程语言' is generic. Reference uses plural 'Programming Languages', which is standard for category headers listing multiple items. Singular form is less natural in this context. | Singular vs plural; source and reference indicate a plural concept. | Singular vs plural - reference is 'Programming Languages' | Incorrect number; should be plural 'Programming Languages' to match reference |
| CohereLabs/tiny-aya-water | Programming language |
轻微
[准确性]
"Programming language"
理由: Singular vs plural; reference indicates a category 'Programming Languages' rather than one language. | Singular form vs reference plural 'Programming Languages' | The source text '编程语言' is translated into English ('Programming language') instead of the target language Chinese (e.g., '编程语言'). The reference confirms the target should be Chinese. | Hypothesis uses singular 'Programming language' while reference uses plural 'Programming Languages' which is the correct form |
| Qwen/Qwen2.5-14B-Instruct | Programming Language |
轻微
[准确性]
"Programming Language"
理由: Reference indicates a category of multiple languages; singular may misrepresent the breadth of the content. | Singular vs plural: reference is 'Programming Languages' (plural), hypothesis is singular | Number mismatch. Should be plural 'Programming Languages' not singular. |
| Qwen/Qwen2.5-7B-Instruct | Programming language |
轻微
[准确性]
"Programming language"
理由: Incorrect number; should be plural 'Programming Languages' | Singular vs plural: reference uses plural 'Programming Languages' | Singular form may not reflect the general plural sense of "编程语言"; reference "Programming Languages" suggests a category rather than a single language. |
| Qwen/Qwen3-14B | Programming Language |
轻微
[准确性]
"Programming Language"
理由: Singular form does not match plural ‘Programming Languages’ in reference, which suggests a list or category of languages. | Singular vs plural: hypothesis uses singular 'Language' while reference uses plural 'Languages' | Incorrect plural form; should be 'Programming Languages' not 'Programming Language' |
| Qwen/Qwen3.5-4B | Programming Language |
轻微
[准确性]
"Programming Language"
理由: Singular form used; should be plural 'Programming Languages' to match reference | Singular form used while reference uses plural 'Programming Languages' | Singular vs. plural mismatch with "Programming Languages"; likely refers to multiple languages in a category context. |
| Qwen/Qwen3.5-9B | Programming language |
轻微
[准确性]
"Programming language"
理由: Singular vs plural mismatch compared to reference "Programming Languages"; meaning otherwise correct. | Singular form does not match plural reference 'Programming Languages' | Singular form used instead of plural 'Programming Languages' as in reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Programming Language |
轻微
[风格]
"Programming Language"
理由: Singular form used where the reference indicates a plural category ('Programming Languages'). | Singular form differs from reference plural 'Programming Languages' | Singular form is less common; 'Programming Languages' is preferred |
| google/gemma-3-12b-it | Programming Language |
轻微
[准确性]
"Programming Language"
理由: Missing plural form; should be 'Programming Languages' | Singular ‘Language’ vs. plural ‘Languages’; may not align with category covering multiple languages. | Singular vs plural: 'Programming Language' vs 'Programming Languages' - reference uses plural |
| google/gemma-3-1b-it | Programming language |
轻微
[准确性]
"Programming language"
理由: Number mismatch; reference uses plural (“Programming Languages”) while hypothesis is singular. | Singular form used; reference uses plural 'Programming Languages' | Singular vs plural mismatch: hypothesis uses singular 'language' while reference uses plural 'Languages' |
| google/gemma-3-4b-it | Programming Language |
轻微
[准确性]
"Programming Language"
理由: Singular vs plural ('Programming Languages'); small number mismatch. | Singular vs reference plural 'Programming Languages' | Number disagreement; reference uses plural 'Programming Languages'. | Hypothesis uses singular 'Programming Language' while reference uses plural 'Programming Languages' |
| google/translategemma-12b-it | Programming language | 通过 (无共识错误) |
| google/translategemma-4b-it | Programming Language |
轻微
[准确性]
"Programming Language"
理由: Number mismatch. '编程语言' should be 'Programming Languages' (plural), not 'Programming Language' (singular) | Singular/plural mismatch: '编程语言' (Programming Languages) translated as singular 'Programming Language' | Singular vs plural mismatch vs '编程语言' used as category 'Programming Languages'; meaning largely preserved. |
| tencent/HY-MT1.5-1.8B | Programming Language |
轻微
[准确性]
"Programming Language"
理由: The reference uses the plural 'Programming Languages', while the hypothesis uses the singular. This is a minor number agreement issue. | Should be plural 'Programming Languages'. | Hypothesis 'Programming Language' (singular) does not match reference 'Programming Languages' (plural). | Number mismatch with reference "Programming Languages"; the hypothesis uses singular where plural is expected, slightly narrowing the intended scope. |
| tencent/HY-MT1.5-7B | Programming language |
轻微
[准确性]
"Programming language"
理由: Singular form vs plural 'Programming Languages' in the reference; slight scope mismatch. | Singular form used while reference uses plural 'Programming Languages' | Hypothesis uses singular 'Programming language' while reference uses plural 'Programming Languages' which matches the source |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Invalid input |
致命
[准确性]
"Invalid input"
理由: Complete hallucination and failure to translate. Source '沙雕图' refers to 'funny memes' or 'shitposts'. The hypothesis claims the input is invalid, which is factually incorrect regarding the source text. | Completely incorrect translation; '沙雕图' means 'Shitposts / Funny Memes', not 'Invalid input' | Incorrect translation of '沙雕图' | Hypothesis is not a translation but an error message indicating invalid input. The translation should be 'Shitposts / Funny Memes' or equivalent in Chinese. | Not a translation; '沙雕图' refers to silly/funny images or memes, not input validity. |
| CohereLabs/tiny-aya-water | absurd drawing |
严重
[准确性]
"absurd drawing"
理由: The source text '沙雕图' (slang for funny/silly images/memes) is translated into English ('absurd drawing') instead of the target language Chinese. Furthermore, 'absurd drawing' is a poor semantic match for '沙雕图' which implies humor/memes rather than just 'absurd'. | Hypothesis 'absurd drawing' is inaccurate. '沙雕图' refers to 'Shitposts / Funny Memes' which is a specific internet culture term, not just absurd drawings | '沙雕图' refers to silly/funny meme images; 'absurd drawing' loses the internet-meme and humorous connotations. | Inaccurate translation: 'absurd drawing' does not match reference 'Shitposts / Funny Memes' | Does not match the reference 'Shitposts / Funny Memes'. |
| Qwen/Qwen2.5-14B-Instruct | Caricature Image |
严重
[准确性]
"Caricature Image"
理由: Severe semantic error; '沙雕图' refers to funny memes/shitposts, not formal 'Caricature Images'. | "沙雕图" in internet slang means silly/stupid-funny pictures or shitposts, not caricatures; the proposed translation changes the humorous, meme-like meaning. | Completely incorrect translation. '沙雕图' should be 'Shitposts / Funny Memes' not 'Caricature Image'. |
| Qwen/Qwen2.5-7B-Instruct | Cute Image |
严重
[准确性]
"Cute Image"
理由: "沙雕图" refers to silly/stupid/funny meme images, not "cute" images; meaning is substantially changed. | Significant meaning distortion; '沙雕图' refers to silly, absurd, or meme-like images ('Shitposts/Funny Memes'), not necessarily 'Cute' ones. | Incorrect translation; should be 'Shitposts / Funny Memes' not 'Cute Image' | Incorrect translation: '沙雕图' means Shitposts/Funny Memes, not Cute Image |
| Qwen/Qwen3-14B | Sand Sculpture Picture |
轻微
[准确性]
"Sand Sculpture Picture"
理由: Severe mistranslation; the source is slang for funny/meme images, not literal pictures of sand sculptures. The reference 'Shitposts / Funny Memes' captures the correct meaning. | Completely incorrect translation; should be 'Shitposts / Funny Memes' which captures the actual meaning of the Chinese slang term | Should be 'Shitposts / Funny Memes' | Literal translation of slang ‘沙雕图’ misses the idiomatic meaning (silly/funny memes); the result is misleading and unnatural in the target context. |
| Qwen/Qwen3.5-4B | Meme image |
严重
[准确性]
"Meme image"
理由: Incomplete translation; should be 'Shitposts / Funny Memes' to match reference | Partial translation: 'Meme image' does not capture the informal/slang meaning 'Shitposts' in the reference | Covers only memes while "沙雕图" in reference includes broader shitposts/funny images; partial semantic coverage. |
| Qwen/Qwen3.5-9B | Meme image |
轻微
[准确性]
"Meme image"
理由: Incomplete translation. Should be 'Shitposts / Funny Memes' to capture the full meaning. | Captures humorous image aspect but omits the stronger "shitposts" nuance present in the reference; partial semantic coverage. | Partial match; reference suggests 'Shitposts / Funny Memes' | Less specific than reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Shallow Drawing |
致命
[准确性]
"Shallow Drawing"
理由: Severe mistranslation; '沙雕图' is slang for funny memes or shitposts, literally misinterpreted as 'shallow carving/drawing'. | Completely incorrect translation; '沙雕图' is internet slang for humorous/funny content, should be 'Shitposts / Funny Memes' not 'Shallow Drawing' | Incorrect translation; should be 'Shitposts / Funny Memes' | Completely incorrect translation: '沙雕图' is internet slang for 'Shitposts / Funny Memes', not 'Shallow Drawing' | '沙雕图' is internet slang meaning silly/funny memes or shitposts, not 'Shallow Drawing'; clear semantic mistranslation. |
| google/gemma-3-12b-it | Funny meme image | 通过 (无共识错误) |
| google/gemma-3-1b-it | Sandjoy picture |
严重
[准确性]
"Sandjoy picture"
理由: Incorrect translation: reference provides 'Shitposts / Funny Memes' as the correct equivalent; 'Sandjoy' is a mistranslation of the Chinese term | Severe mistranslation; fails to convey intended colloquial meaning of humorous or shitpost-style images (“Shitposts / Funny Memes”). | Nonsensical literal translation of slang. '沙雕' means silly/ridiculous/funny, not 'sand joy'. Should be 'Shitposts' or 'Funny Memes'. | Should be translated to 'Shitposts / Funny Memes' instead of an incorrect translation | Incorrect translation; '沙雕图' should be 'Shitposts / Funny Memes' not 'Sandjoy picture' |
| google/gemma-3-4b-it | Joke picture | 通过 (无共识错误) |
| google/translategemma-12b-it | Funny picture |
轻微
[术语]
"Funny picture"
理由: Term choice: 'Funny picture' vs 'Shitposts / Funny Memes' - less precise translation | Inaccurate translation; '沙雕图' should be 'Shitposts / Funny Memes' not 'Funny picture' | Should be 'Shitposts / Funny Memes' |
| google/translategemma-4b-it | Subtitle |
致命
[准确性]
"Subtitle"
理由: Completely incorrect translation. '沙雕图' should be 'Shitposts / Funny Memes', not 'Subtitle' | Mistranslation: '沙雕图' (Shitposts/Funny Memes) translated to 'Subtitle' | The translation is incorrect; 'Subtitle' does not correspond to '沙雕图'. | Mistranslation; '沙雕图' refers to silly/funny meme images, not 'Subtitle'. | The source '沙雕图' (funny memes/shitposts) is translated as 'Subtitle', which is completely unrelated to the meaning of the source. |
| tencent/HY-MT1.5-1.8B | 沙雕图 |
致命
[准确性]
"沙雕图"
理由: The hypothesis repeats the Chinese source text instead of translating it to English ('Shitposts / Funny Memes'). | Source retained instead of translated - 'Shitposts / Funny Memes' is the correct translation | Hypothesis is in Chinese but target language is zh and reference is in English 'Shitposts / Funny Memes'. Should be translated to English. | Missing translation for target language zh. |
| tencent/HY-MT1.5-7B | Cartoon image |
严重
[准确性]
"Cartoon image"
理由: '沙雕图' refers to silly or absurd meme images; 'Cartoon image' loses the humorous/meme-specific connotation. | Loss of connotation; '沙雕图' refers to funny/meme images ('Shitposts'), not just cartoons. | Hypothesis 'Cartoon image' is an incorrect literal translation; reference 'Shitposts / Funny Memes' captures the actual meaning of '沙雕图' in internet culture |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | seedance2 | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | seedance2 |
轻微
[准确性]
"seedance2"
理由: Capitalization mismatch: 'seedance2' vs reference 'Seedance2' | Hypothesis uses lowercase 'seedance2' while reference uses proper capitalization 'Seedance2' for the brand name | The source text is outputted in English/Latin script without translation or transliteration into Chinese, violating the target language constraint (zh). |
| Qwen/Qwen2.5-14B-Instruct | seedance2 |
轻微
[风格]
"seedance2"
理由: The correct capitalization is 'Seedance2'. | Incorrect capitalization. Should be 'Seedance2' with capital S. | Reference capitalizes the name as "Seedance2"; keeping it all lowercase is a minor inconsistency with the branded form. |
| Qwen/Qwen2.5-7B-Instruct | seedance2 | 通过 (无共识错误) |
| Qwen/Qwen3-14B | seedance2 |
轻微
[风格]
"seedance2"
理由: Incorrect capitalization; should be 'Seedance2' with capital S | Does not follow reference capitalization ‘Seedance2’; minor inconsistency with the named entity’s preferred form. | Incorrect capitalization for the proper noun/model name; should be 'Seedance2'. |
| Qwen/Qwen3.5-4B | seedance2 |
轻微
[风格]
"seedance2"
理由: Incorrect capitalization; should be 'Seedance2' with capital S | Incorrect capitalization; should be 'Seedance2'. | Capitalization differs from reference "Seedance2"; minor stylistic inconsistency for name. |
| Qwen/Qwen3.5-9B | seedance2 |
轻微
[风格]
"seedance2"
理由: Lowercase initial letter differs from reference "Seedance2"; minor capitalization/style inconsistency. | Case difference: lowercase 'seedance2' vs reference 'Seedance2' | Capitalization: As a proper noun/model name, it should be capitalized 'Seedance2' per the reference. | Capitalization differs. Should be 'Seedance2' with capital S as in reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | seedance2 |
轻微
[风格]
"seedance2"
理由: Incorrect capitalization; should be 'Seedance2' with capital S | Incorrect capitalization for the proper noun/model name; should be 'Seedance2'. | Capitalization mismatch: reference uses 'Seedance2' with capital S |
| google/gemma-3-12b-it | seedance2 |
轻微
[风格]
"seedance2"
理由: Incorrect capitalization; should be 'Seedance2' with capital S | Capitalization difference: 'seedance2' vs 'Seedance2' | Incorrect capitalization of the proper noun; should be 'Seedance2'. | Not capitalized like reference ‘Seedance2’; stylistic/branding consistency issue. |
| google/gemma-3-1b-it | seedance2 | 通过 (无共识错误) |
| google/gemma-3-4b-it | seedance |
严重
[准确性]
"seedance"
理由: Omission of version number '2' present in source 'seedance2' and reference. | The translation omits the '2' from the source. | Missing numeral '2' present in both source 'seedance2' and reference 'Seedance2' | Hypothesis omits the '2' from 'seedance2', while reference correctly includes it as 'Seedance2' | Omission of the '2' changes the specific identifier/name 'seedance2' to 'seedance', losing version/variant information. |
| google/translategemma-12b-it | seedance2 |
轻微
[术语]
"seedance2"
理由: Incorrect capitalization; should be 'Seedance2' with capital S | Casing does not match expected stylized name "Seedance2"; should preserve proper capitalization. | Should be 'Seedance2' |
| google/translategemma-4b-it | seedance2 |
轻微
[风格]
"seedance2"
理由: Capitalization differs from reference 'Seedance2'; minor stylistic/branding issue. | Capitalization error. 'seedance2' should be capitalized as 'Seedance2' | Capitalization: 'seedance2' should be 'Seedance2' to match proper noun in reference | The hypothesis uses all lowercase while the reference uses title case 'Seedance2'. Consistency in branding capitalization is expected. |
| tencent/HY-MT1.5-1.8B | seedance2 | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | seedance2 | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The quick brown fox jumps over the lazy dog. |
致命
[准确性]
"The quick brown fox jumps over the lazy dog."
理由: Completely unrelated content; does not convey the meaning of “草台班子” (amateur/poorly-run team). | The hypothesis is completely unrelated to the source text '草台班子' (Amateur Troupe), representing a total hallucination or data mismatch. | Complete mistranslation - source is Chinese idiom meaning amateur troupe or poorly-run team, but hypothesis is English pangram with no relation to source | Hypothesis is completely unrelated to source text. Should translate '草台班子' (amateur troupe/poorly-run team) not provide an English idiom. |
| CohereLabs/tiny-aya-water | We provide high-quality products with excellent customer service. |
致命
[准确性]
"We provide high-quality products with excellent customer service."
理由: Completely unrelated content; mistranslation of idiomatic expression “草台班子” meaning an amateur or poorly run team. | Hypothesis is completely unrelated to source. Source '草台班子' means 'amateur troupe/poorly-run team' but hypothesis describes a business service offering. | The translation does not match the source meaning. | Complete mistranslation - source is a Chinese idiom meaning amateur troupe or poorly-run team, but hypothesis translates as if source is a brand slogan | The hypothesis is a complete hallucination unrelated to the source text '草台班子' (which means an amateur or makeshift troupe). The meaning is entirely lost. |
| Qwen/Qwen2.5-14B-Instruct | Ad hoc group |
严重
[准确性]
"Ad hoc group"
理由: The translation does not capture the amateur or poorly-run aspect of the source text. | The translation fails to capture the derogatory and specific cultural nuance of '草台班子', which implies an amateurish, makeshift, or poorly organized group. 'Ad hoc group' is neutral and misses the critical connotation of incompetence or lack of professionalism indicated by the reference 'Amateur Troupe (Poorly-run Team)'. | Mistranslation of '草台班子'. The term refers to an amateur or poorly-run troupe/team, not an ad hoc group. Reference provides more accurate meaning. | “草台班子” is a colloquial/derogatory term meaning an amateur, poorly run team; “ad hoc group” misses the negative, unprofessional nuance. | Translation does not match reference meaning - 'Ad hoc group' differs from 'Amateur Troupe (Poorly-run Team)' |
| Qwen/Qwen2.5-7B-Instruct | rural troupe |
严重
[准确性]
"rural troupe"
理由: The translation is accurate but less precise than the reference. | “草台班子” is an idiom meaning an unprofessional, makeshift or poorly run team, not specifically a rural troupe. | Incorrect translation - '草台班子' means amateur/shoddy team, not specifically rural troupe | The source '草台班子' is an idiom referring to an amateurish, makeshift, or poorly organized group. The translation 'rural troupe' interprets the characters literally (grass/stage) and misses the figurative meaning of incompetence or amateurism conveyed in the reference. | Hypothesis misses the negative connotation of '草台班子' (poorly-run/amateurish team). 'Rural troupe' is too literal and doesn't capture the derogatory meaning present in the source and reference. |
| Qwen/Qwen3-14B | amateurish group | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Make-do troupe |
严重
[准确性]
"Make-do troupe"
理由: Translation does not convey the intended meaning of 'amateur/poorly-run team' - 'make-do' implies makeshift/inadequate which partially captures the negative connotation but misses the 'team/organization' aspect and cultural reference | "草台班子" is a colloquial term meaning an amateurish or poorly run group; "make-do troupe" sounds neutral and does not convey the sense of incompetence or makeshift nature. | The translation is somewhat accurate but not as precise as the reference. | The translation 'Make-do troupe' is unnatural and fails to convey the specific cultural nuance of '草台班子' (an amateurish, makeshift, or poorly organized group), which is better captured by 'Amateur Troupe' or similar terms in the reference. | Hypothesis uses 'Make-do' which is less accurate than 'Amateur' or 'Poorly-run'. The reference provides a more complete and accurate translation capturing both the amateur and poorly-run aspects of '草台班子'. |
| Qwen/Qwen3.5-9B | Amateur group / makeshift team | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Executive Committee |
致命
[准确性]
"Executive Committee"
理由: Mistranslation; the source idiom refers to an amateur/poorly run team, not a formal governing body. | The source '草台班子' refers to an amateur, makeshift, or poorly organized group. The translation 'Executive Committee' implies a formal, high-level governing body, which is the opposite of the source meaning. | Mistranslation - '草台班子' means amateur troupe or poorly-organized team, not Executive Committee | The translation does not match the reference and conveys a different meaning. | Completely incorrect translation. '草台班子' means an amateur or poorly-run troupe/team, not an executive committee. |
| google/gemma-3-12b-it | Amateurs |
严重
[准确性]
"Amateurs"
理由: Translation omits the cultural connotation of '班子' (team/troupe) and the idiomatic meaning of '草台班子' (improvised, poorly-organized group) | Incomplete translation. '草台班子' means an amateur or poorly-run team/troupe, not just 'Amateurs'. Missing the organizational context. | Misses the core meaning of “草台班子”, which refers to a poorly organized / makeshift team or troupe, not simply “amateurs” without the notion of being ramshackle or poorly run. | The translation is too simplified and loses the nuance of the original term. | The source '草台班子' refers to a makeshift, amateurish troupe or organization (often implying poor quality or improvisation), whereas 'Amateurs' simply refers to non-professionals without the connotation of a disorganized group or makeshift nature. The translation misses the specific cultural nuance and collective noun aspect. |
| google/gemma-3-1b-it | 草台班子 |
严重
[准确性]
"草台班子"
理由: Source text not translated to target language (English). Hypothesis should be 'Amateur Troupe' or similar English translation, not the Chinese source itself. | The hypothesis fails to translate the source text into the target language (Chinese), instead copying the source characters. The target language is specified as Chinese, but the source is already Chinese, implying the task is likely Chinese-to-English based on the reference. If the task is indeed Source(ZH)->Target(ZH), copying is correct, but the reference is English, indicating a Source(ZH)->Target(EN) task. Given the reference is English, the hypothesis should be English. Copying Chinese characters when English is expected is a critical accuracy error. | The source idiom means an amateur or poorly run group; leaving it untranslated in Chinese fails to convey the intended meaning in the target language context. |
| google/gemma-3-4b-it | Bunch of riffraff |
严重
[准确性]
"Bunch of riffraff"
理由: Mistranslates the idiom; focuses on low-quality people rather than an amateur/poorly-run team or group setup. | Translation is inaccurate. '草台班子' means an amateur or poorly-run team/troupe, not 'bunch of riffraff'. The reference 'Amateur Troupe (Poorly-run Team)' is more accurate. | Mistranslation - '草台班子' means amateur/troupe, not riffraff | The translation conveys a negative connotation which may not be fully accurate. | The translation 'Bunch of riffraff' carries a derogatory connotation implying low-class people, whereas the source '草台班子' refers to an amateur, makeshift, or poorly organized team without necessarily insulting the character of the individuals. The reference 'Amateur Troupe' captures the intended meaning more accurately. |
| google/translategemma-12b-it | Amateur team |
轻微
[准确性]
"Amateur team"
理由: Translation omits the cultural connotation of 草台班子 (improvised, poorly-organized group) and is incomplete compared to reference 'Amateur Troupe (Poorly-run Team)' | Missing key connotation: '草台班子' implies a poorly-run or unprofessional team, not just an amateur one. Reference includes '(Poorly-run Team)' to capture this nuance. | The translation is accurate but could be more precise. 'Troupe' might be a better fit than 'team' in this context. | The source idiom "草台班子" implies a poorly organized or unprofessional group; "Amateur team" omits the negative / poorly-run nuance. |
| google/translategemma-4b-it | Exit |
致命
[准确性]
"Exit"
理由: The translation 'Exit' is completely unrelated to the source meaning 'Amateur Troupe' or 'makeshift team'. It appears to be a hallucination or UI label mismatch. | Complete mistranslation - 'Exit' has no relation to '草台班子' (Amateur Troupe/Poorly-run Team) | The translation does not match the source or reference meaning. | Hypothesis 'Exit' does not match source '草台班子' (Amateur Troupe). Complete mistranslation. | Mistranslation; unrelated to the source idiom '草台班子', which means an amateur or poorly run team. |
| tencent/HY-MT1.5-1.8B | Translation of '草台班子' in en |
致命
[准确性]
"Translation of '草台班子' in en"
理由: The hypothesis does not provide a translation but merely states the intent to translate. | The hypothesis is a meta-commentary describing the task rather than providing the actual translation of the source text. | Hypothesis is a placeholder/template rather than an actual translation | Not translated; meta-statement about translation instead of conveying meaning 'amateur troupe / poorly-run team'. | Hypothesis is a meta-description rather than an actual translation. Should provide the translation equivalent like 'Amateur Troupe' or similar. |
| tencent/HY-MT1.5-7B | grassroots team |
严重
[准确性]
"grassroots team"
理由: Mistranslates the idiom “草台班子”, which means an amateur, makeshift or poorly run group, not a grassroots team. | The term 'grassroots team' implies a bottom-up community organization, whereas '草台班子' refers to an amateur, makeshift, or poorly organized group. The translation misses the negative connotation of incompetence or improvisation. | Inaccurate translation. '草台班子' means an amateur or poorly-run team/troupe, not a grassroots team. The reference 'Amateur Troupe (Poorly-run Team)' is more accurate. | The translation is correct but less precise than the reference. | Mistranslation - 'grassroots team' does not capture the meaning of '草台班子' which refers to an amateur or poorly-organized group |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The maximum number of products is {max} items |
致命
[准确性]
"The maximum number of products is {max} items"
理由: Hypothesis is completely unrelated to source text. Should translate '今日最佳' (Best of the Day) not a product count message. | Mistranslation; “今日最佳” means “Best of the Day,” not a product quantity limit message. | Complete mistranslation - source means 'Best of the Day' but hypothesis is unrelated product quantity text | The hypothesis translates a completely different sentence regarding product limits, failing to translate the source '今日最佳' (Best of the Day). |
| CohereLabs/tiny-aya-water | Best of Today | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Today's Best | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Today's Best |
轻微
[风格]
"Today's Best"
理由: While acceptable, 'Best of the Day' is more natural and idiomatic than 'Today's Best' for this phrase. | Capitalization and wording are acceptable but slightly awkward; “Best of the Day” is a more natural expression. | Slight stylistic difference from reference 'Best of the Day' |
| Qwen/Qwen3-14B | Today's Best | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Today's Best | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Best Today | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Today's Best | 通过 (无共识错误) |
| google/gemma-3-12b-it | Today's Best | 通过 (无共识错误) |
| google/gemma-3-1b-it | Today's best | 通过 (无共识错误) |
| google/gemma-3-4b-it | Today's Best | 通过 (无共识错误) |
| google/translategemma-12b-it | Today's Best | 通过 (无共识错误) |
| google/translategemma-4b-it | Search |
致命
[准确性]
"Search"
理由: Hypothesis 'Search' does not match source '今日最佳' (Best of the Day). Complete mistranslation. | The translation does not match the source or reference meaning. | Complete mistranslation - 'Search' does not convey '今日最佳' (Best of the Day) | Mistranslation; '今日最佳' means 'Best of the Day', not 'Search'. | The translation 'Search' does not match the source meaning 'Best of the Day'. It is likely a UI label error. |
| tencent/HY-MT1.5-1.8B | Translation of '今日最佳' in en |
致命
[准确性]
"Translation of '今日最佳' in en"
理由: The hypothesis does not provide a translation but merely states the intent to translate. | Hypothesis is a meta-description rather than an actual translation. Should provide 'Best of the Day' or equivalent. | The hypothesis is a meta-commentary describing the task rather than providing the actual translation of the source text. | Hypothesis is a placeholder/template rather than an actual translation | No actual translation given; only a meta-description about translating the phrase. |
| tencent/HY-MT1.5-7B | Today's Best |
轻微
[风格]
"Today's Best"
理由: Acceptable but slightly different phrasing. Reference 'Best of the Day' is more natural in English. | Acceptable but less idiomatic than “Best of the Day”; tense/collocation slightly unnatural for a fixed label. | Style variation - 'Best of the Day' is more idiomatic in English than 'Today's Best' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | WSL provides a Linux shell environment with full access to system resources. |
轻微
[准确性]
"WSL provides a Linux shell environment with full access to system resources."
理由: Hypothesis adds extensive explanation not present in source. Should be minimal translation/transliteration of 'WSL'. | The hypothesis adds an explanatory sentence instead of translating the acronym 'WSL' as a proper noun, which is inappropriate for this context. | Overly specific and added information not present in the source; source is just the name/acronym “WSL.” | Mistranslation - source is an abbreviation (WSL) but hypothesis provides full definition instead of translation |
| CohereLabs/tiny-aya-water | WSL | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | WSL | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | WSL | 通过 (无共识错误) |
| Qwen/Qwen3-14B | WSL | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | WSL | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | WSL | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Windows Subsystem for Linux | 通过 (无共识错误) |
| google/gemma-3-12b-it | WSL | 通过 (无共识错误) |
| google/gemma-3-1b-it | WSL | 通过 (无共识错误) |
| google/gemma-3-4b-it | Windows Subsystem for Linux | 通过 (无共识错误) |
| google/translategemma-12b-it | WSL | 通过 (无共识错误) |
| google/translategemma-4b-it | WSL | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | WSL的翻译是 |
致命
[准确性]
"WSL的翻译是"
理由: The hypothesis does not provide a translation but merely states the intent to translate. | Hypothesis is a meta-description in Chinese rather than the actual translation. Should be 'WSL' only. | The hypothesis includes unnecessary explanatory text ('的翻译是') instead of just translating or retaining the acronym as per the reference. | Meta text in Chinese stating 'The translation of WSL is' without providing the required term 'WSL' as in the reference. | Hypothesis is a placeholder/template rather than an actual translation; source is WSL but hypothesis adds Chinese text |
| tencent/HY-MT1.5-7B | Windows Subsystem for Linux | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | POE | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | POE | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | POE | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | POE |
轻微
[风格]
"POE"
理由: Hypothesis uses all caps 'POE' while reference uses proper capitalization 'Poe'. For proper nouns/names, capitalization style should match reference. | Inconsistency in capitalization compared to the reference 'Poe'. While 'POE' is often an acronym, the reference suggests a proper noun styling which the hypothesis ignores. | All-caps differs from the reference “Poe”; depending on branding, case may matter. |
| Qwen/Qwen3-14B | POE | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | POE | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | POE | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Player Order Exchange |
严重
[准确性]
"Player Order Exchange"
理由: The translation incorrectly interprets the acronym POE. | The source 'POE' likely refers to the game 'Path of Exile' or the author 'Edgar Allan Poe'. 'Player Order Exchange' is a hallucinated expansion that does not match the common entity associated with the acronym in this context. | Incorrect expansion of acronym. POE should be 'Poe' (proper noun/product name), not expanded to 'Player Order Exchange'. | Unjustified expansion; source is an acronym with no evidence for this interpretation. | Mistranslation - 'POE' is a proper noun (game name) that should remain as-is or transliterated as 'Poe' |
| google/gemma-3-12b-it | POE | 通过 (无共识错误) |
| google/gemma-3-1b-it | PoE | 通过 (无共识错误) |
| google/gemma-3-4b-it | Today |
致命
[准确性]
"Today"
理由: Complete mistranslation. The source 'POE' is a proper noun (likely referring to the game Path of Exile or the author Edgar Allan Poe), while the hypothesis translates it as 'Today', which is semantically unrelated. | Complete mistranslation - 'POE' translated as 'Today' with no relation to source | Completely incorrect; "POE" is a name/acronym and is mistranslated as a temporal adverb. | Completely incorrect translation. 'POE' should be 'Poe' (a proper name/acronym), not 'Today'. | Complete mistranslation of the acronym 'POE'. |
| google/translategemma-12b-it | Proof of Existence |
严重
[准确性]
"Proof of Existence"
理由: Incorrect expansion of acronym. 'POE' should be 'Poe' (proper noun/name), not expanded to 'Proof of Existence'. | The source 'POE' likely refers to the game 'Path of Exile' or the author 'Poe' given the reference 'Poe'. Expanding it to 'Proof of Existence' is a hallucination or incorrect expansion of the acronym in this context. | Incorrect expansion of POE; reference shows it should remain as 'Poe' (proper name) | "POE" here is a proper name ("Poe"), not the acronym "Proof of Existence"; meaning is changed. |
| google/translategemma-4b-it | Edit |
致命
[准确性]
"Edit"
理由: Mistranslation; should preserve acronym 'POE/Poe' instead of translating to an unrelated UI action. | Incorrect translation - 'Edit' does not match 'POE' (proper noun Poe) | Hypothesis 'Edit' does not match source 'POE' (Poe). Complete mistranslation. | The translation does not match the source or reference meaning. | The translation 'Edit' is unrelated to the source 'POE' (likely a proper noun or acronym). The reference suggests keeping it as 'Poe'. |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Completely incorrect meaning; '登录' means 'log in' and does not correspond to the acronym/name 'POE'. | Complete mistranslation; source is POE (proper noun), hypothesis translates to 'login' which is completely unrelated | The translation '登录' (Login) is completely unrelated to the source 'POE', which should be retained as a proper noun. | Hypothesis '登录' (login) is completely incorrect. Should be 'Poe' - the acronym/proper noun for the source 'POE'. |
| tencent/HY-MT1.5-7B | POE | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | GTP-54 | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | GTP-54 | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | GTP-54 | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | GTP-54 | 通过 (无共识错误) |
| Qwen/Qwen3-14B | GTP-54 | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | GTP-54 | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | GTP-54 | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | GTP-54 | 通过 (无共识错误) |
| google/gemma-3-12b-it | GTP-54 | 通过 (无共识错误) |
| google/gemma-3-1b-it | GTP-54 | 通过 (无共识错误) |
| google/gemma-3-4b-it | GTP-54 | 通过 (无共识错误) |
| google/translategemma-12b-it | GTP-54 | 通过 (无共识错误) |
| google/translategemma-4b-it | GTP-54 | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: Hypothesis '登录' (login) is completely incorrect. Should be 'GTP-54' - the source identifier should be preserved. | Wrong term; '登录' means 'log in' and does not match the source/reference string 'GTP-54'. | The translation '登录' (Login) is completely unrelated to the source 'GTP-54', which should be retained as an identifier. | Complete mistranslation; source is GTP-54 (proper noun), hypothesis translates to 'login' which is completely unrelated |
| tencent/HY-MT1.5-7B | GTP-54 | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | This feature requires a login to use. |
致命
[准确性]
"This feature requires a login to use."
理由: Completely wrong meaning; “圆圆满满” expresses completeness/perfection, not login requirements. | Complete mistranslation - source means 'Perfectly Complete' but hypothesis is unrelated login requirement text | The hypothesis is unrelated to the source '圆圆满满' (Perfectly Complete), representing a total hallucination. | Hypothesis is completely unrelated to source text. Should translate '圆圆满满' (perfectly complete) not a login requirement message. |
| CohereLabs/tiny-aya-water | Round and full |
严重
[准确性]
"Round and full"
理由: Literal rendering misses idiomatic meaning of “圆圆满满” which conveys a sense of complete success, perfection, or everything going smoothly. | Literal translation that misses the idiomatic meaning - source means 'perfectly complete' or 'flawless', not literally 'round and full' | The translation does not match the source meaning. | Hypothesis 'Round and full' is a literal translation but misses the idiomatic meaning. Reference 'Perfectly Complete' better captures the intended sense of '圆圆满满'. | The translation is overly literal. '圆圆满满' is an idiom meaning 'perfectly complete', 'satisfactory', or 'flawless', not physically round and full. |
| Qwen/Qwen2.5-14B-Instruct | Perfectly Complete | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | 圆满圆满 |
轻微
[准确性]
"圆满圆满"
理由: Hypothesis repeats '圆满' instead of maintaining the source's '圆圆满满' structure. This is a direct error in reproducing the source text. | The phrase is not translated; it repeats a similar Chinese term instead of conveying the meaning “perfectly complete” in the target language. | Redundant character - should be 圆圆满满 not 圆满圆满 | The hypothesis repeats the source characters instead of translating them into the target language (English). It fails to convey the meaning 'Perfectly Complete'. |
| Qwen/Qwen3-14B | Perfectly Rounded |
严重
[准确性]
"Perfectly Rounded"
理由: '圆圆满满' means perfectly complete/whole, not rounded. Reference correctly uses 'Perfectly Complete'. | Mistranslation of the idiom '圆圆满满'. The source refers to completeness and perfection in life or events, not physical roundness. | "圆圆满满" figuratively means "perfectly complete / everything goes smoothly and satisfactorily"; "Perfectly Rounded" implies shape rather than the idiomatic notion of completeness and fulfillment. | Translation 'Rounded' does not capture the meaning of 圆满 (complete, satisfactory) as indicated in reference 'Perfectly Complete' |
| Qwen/Qwen3.5-4B | Complete and whole | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Complete and full | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Round and full |
严重
[准确性]
"Round and full"
理由: Mistranslation - '圆圆满满' means perfectly complete/satisfactory, not literally round and full | The translation is too literal. '圆圆满满' is an idiom meaning 'perfectly complete', 'satisfactory', or 'flawless'. 'Round and full' fails to convey the idiomatic meaning of completeness and success. | The translation does not fully capture the meaning of 'Perfectly Complete'. | Literal translation misses the idiomatic meaning. '圆圆满满' conveys completeness and perfection, better expressed as 'Perfectly Complete'. | Literal translation misses idiomatic meaning of things being satisfactory / complete in a positive sense. |
| google/gemma-3-12b-it | Full and round |
严重
[准确性]
"Full and round"
理由: Translation is too literal and misses the idiomatic meaning of '圆圆满满' (perfect, complete, satisfactory) | The translation is overly literal. '圆圆满满' is an idiom meaning 'perfectly complete,' 'satisfactory,' or 'flawless' (often regarding outcomes or family reunions), not physically 'full and round'. | Literal translation misses the idiomatic meaning. '圆圆满满' conveys 'perfectly complete' or 'perfectly fulfilled', not just 'full and round'. | The translation does not fully capture the meaning of the original term. | Literal rendering misses the idiomatic sense of “圆圆满满” meaning perfectly complete / satisfactory; “full and round” is unnatural and does not convey the intended positive conclusion meaning. |
| google/gemma-3-1b-it | Full circle, full abundance |
严重
[准确性]
"Full circle, full abundance"
理由: The translation is overly literal and verbose. '圆圆满满' implies completeness and perfection, best captured by 'Perfectly Complete'. 'Full circle' has a different idiomatic meaning (returning to the start), which is not present in the source. | Over-translation; reference is more concise and accurate | Translation is inaccurate. '圆圆满满' means 'perfectly complete' or 'perfectly round and full', not 'full circle, full abundance'. | Conveys completeness and abundance but splits the phrase and adds nuance not clearly present; still close in meaning to 'Perfectly Complete'. |
| google/gemma-3-4b-it | Round and full |
严重
[准确性]
"Round and full"
理由: Literal rendering misses the idiomatic meaning of "圆圆满满" which conveys being complete/perfectly satisfactory, not physical roundness. | Literal translation that fails to capture the idiomatic meaning. '圆圆满满' signifies perfection, completeness, and satisfaction (often regarding family or life), not physical shape. 'Perfectly Complete' is the appropriate equivalent. | Literal translation - '圆圆满满' means perfectly complete/satisfactory, not literally round and full | Does not fully capture the intended meaning of 'Perfectly Complete'. | Literal translation misses the idiomatic meaning. '圆圆满满' conveys completeness and perfection, better captured by 'Perfectly Complete' (reference). |
| google/translategemma-12b-it | Full and happy |
轻微
[准确性]
"Full and happy"
理由: Inaccurate translation. '圆圆满满' means 'perfectly complete/whole', not 'full and happy'. Reference 'Perfectly Complete' is more accurate. | The translation does not fully capture the meaning of '圆圆满满', which implies completeness and perfection. | "圆圆满满" emphasizes completeness/perfection and being without regrets; "Full and happy" partially reflects positive feeling but loses the completeness / perfection nuance. | Translation is too literal and loses the meaning of 圆圆满满 (perfect, complete) in reference 'Perfectly Complete' |
| google/translategemma-4b-it | Message |
致命
[准确性]
"Message"
理由: Complete mistranslation - 'Message' has no relation to '圆圆满满' (Perfectly Complete) | The translation 'Message' is completely unrelated to the source idiom '圆圆满满' (Perfectly Complete/Satisfactory). | Hypothesis 'Message' does not match source '圆圆满满' (Perfectly Complete). Complete mistranslation. | The translation does not match the source or reference meaning. | Mistranslation; '圆圆满满' means 'perfectly complete' or 'fulfilling', not 'Message'. |
| tencent/HY-MT1.5-1.8B | Circle is complete |
严重
[准确性]
"Circle is complete"
理由: Translation is inaccurate. '圆圆满满' means 'perfectly complete' or 'perfectly round and full', not 'circle is complete'. Reference 'Perfectly Complete' is more accurate. | The translation 'Circle is complete' does not match the reference 'Perfectly Complete'. | The translation is too literal and misses the idiomatic meaning of '圆圆满满' (perfectly complete/satisfactory), resulting in unnatural phrasing. |
| tencent/HY-MT1.5-7B | Round and full |
严重
[准确性]
"Round and full"
理由: Mistranslation - literal translation misses the idiomatic meaning of '圆圆满满' meaning perfectly complete/satisfactory | The translation is overly literal. '圆圆满满' is an idiom meaning 'perfectly complete', 'satisfactory', or 'flawless', not a physical description of shape. | The translation is correct but less precise than the reference. | Literal but inaccurate translation. '圆圆满满' is an idiomatic expression meaning 'perfectly complete' or 'perfectly fulfilled', not literally 'round and full'. | Literal rendering misses the idiomatic sense of “圆圆满满” meaning ‘perfectly satisfactory/complete’; sounds unnatural in English. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Lightning protection |
严重
[准确性]
"Lightning protection"
理由: Hypothesis translates literal meaning but misses idiomatic sense. '避雷' means 'avoiding pitfalls/warnings' not literal 'lightning protection'. | Literal meaning only; misses colloquial sense of “避雷” as avoiding pitfalls/bad recommendations. | The translation 'Lightning protection' is the literal meaning of the characters but fails to capture the slang meaning of '避雷' (avoiding pitfalls/bad products) in this context. |
| CohereLabs/tiny-aya-water | Lightning protection |
严重
[准确性]
"Lightning protection"
理由: Translates the literal meaning but misses colloquial sense of “避雷” as ‘avoid pitfalls’ or ‘warning to stay away’, not physical lightning protection. | The translation captures the literal meaning of the characters but misses the idiomatic/slang usage in modern Chinese context, which means 'avoiding pitfalls', 'steering clear of bad products/experiences', or 'warnings'. | Literal translation missing the figurative meaning - source means 'avoiding pitfalls/warnings' in figurative sense, not literal lightning protection | Hypothesis 'Lightning protection' is too literal. '避雷' in context means 'avoiding pitfalls/warnings' not literal lightning protection. Reference better captures the metaphorical meaning. |
| Qwen/Qwen2.5-14B-Instruct | Avoid Lightning |
严重
[准确性]
"Avoid Lightning"
理由: The translation is too literal and does not match the reference, which is 'Avoiding Pitfalls / Warnings'. | “避雷” is an internet idiom meaning to avoid pitfalls/bad recommendations; literal “lightning” is incorrect in this context. | This is a severe mistranslation. '避雷' in internet slang means to avoid bad products, scams, or pitfalls (literally 'avoid lightning strikes'), not to physically avoid lightning. The translation takes the literal meaning incorrectly, completely failing to convey the intended meaning of 'Avoiding Pitfalls / Warnings'. | Literal translation missing the idiomatic meaning 'Avoiding Pitfalls / Warnings' | Literal translation misses the idiomatic meaning. '避雷' means 'avoiding pitfalls/warnings' in context, not literal lightning avoidance. |
| Qwen/Qwen2.5-7B-Instruct | Lightning protection |
严重
[准确性]
"Lightning protection"
理由: “避雷” in internet slang means to avoid pitfalls or warn others away, not literal lightning protection. | The translation conveys a different meaning than the reference. | Hypothesis is too literal. '避雷' in this context means 'avoiding pitfalls/warnings' (metaphorical), not literal lightning protection. | The source '避雷' in internet slang means 'avoiding pitfalls' or 'warnings about bad products/experiences'. The translation 'Lightning protection' is a literal physical interpretation that misses the contextual meaning entirely. | Literal translation - '避雷' means to avoid pitfalls/warnings, not literal lightning protection |
| Qwen/Qwen3-14B | Avoiding lightning |
严重
[准确性]
"Avoiding lightning"
理由: The translation does not capture the intended meaning of avoiding pitfalls or warnings. | Literal translation misses the idiomatic meaning of 避雷 (avoiding pitfalls/warnings) as indicated in reference | Literal translation misses the idiomatic meaning of "avoiding pitfalls / traps / bad choices"; it incorrectly suggests physical avoidance of lightning instead of warnings or risk-avoidance. | Literal translation error. '避雷' in this context is internet slang meaning 'avoiding pitfalls' or 'steering clear of bad products/experiences', not avoiding actual lightning. | '避雷' is an idiom meaning to avoid pitfalls or warnings, not literal lightning. Reference correctly provides 'Avoiding Pitfalls / Warnings'. |
| Qwen/Qwen3.5-4B | Lightning avoidance |
严重
[准确性]
"Lightning avoidance"
理由: Literal translation of '避雷' (avoid lightning) misses the idiomatic meaning of 'avoiding pitfalls/warnings' in internet slang context | The translation is somewhat accurate but not as precise as the reference. | "避雷" in internet slang refers to avoiding pitfalls or bad recommendations; literal "lightning avoidance" is misleading and incorrect in this context. | Literal translation error. '避雷' in this context means 'avoiding pitfalls' or 'warnings about bad products/experiences', not avoiding actual lightning. | Hypothesis is too literal. '避雷' in context means avoiding pitfalls or warnings, not literal lightning avoidance. Reference provides the correct idiomatic meaning. |
| Qwen/Qwen3.5-9B | Avoid pitfalls | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Thunder Shelters |
严重
[准确性]
"Thunder Shelters"
理由: Mistranslation - '避雷' means to avoid pitfalls/warnings, not literal thunder shelters | The source '避雷' is internet slang meaning 'to avoid pitfalls' or 'warn against bad products/experiences'. The translation 'Thunder Shelters' is a literal misinterpretation of the characters (avoid lightning) and completely misses the semantic meaning. | The translation is incorrect; it should convey 'Avoiding Pitfalls / Warnings'. | Incorrect translation. '避雷' means avoiding pitfalls or warnings, not physical shelters from thunder. | Mistranslation; source means avoiding pitfalls or heeding warnings, not physical shelters from thunder. |
| google/gemma-3-12b-it | Lightning protection |
严重
[准确性]
"Lightning protection"
理由: Translation only captures literal meaning; '避雷' in internet slang means 'to avoid pitfalls/warnings' not literal lightning protection | This is a severe mistranslation of internet slang. '避雷' in this context means 'to avoid pitfalls,' 'steer clear of bad products/experiences,' or 'warnings,' not the physical engineering concept of lightning protection. | Incorrect meaning. '避雷' in this context means 'avoiding pitfalls' or 'warnings', not literal 'lightning protection'. | Translates the literal meaning but misses the colloquial sense of “避雷” as ‘avoid pitfalls / steer clear (of something bad)’ rather than physical lightning protection. | The translation does not match the reference meaning. |
| google/gemma-3-1b-it | 避雷 |
严重
[准确性]
"避雷"
理由: Failed to translate the source text into English. The hypothesis copies the Chinese source characters instead of providing the English equivalent as shown in the reference. | Source text not translated to target language. Hypothesis should be 'Avoiding Pitfalls' or 'Warnings', not the Chinese source itself. | The source should be translated into the target language; leaving it in Chinese does not express the meaning 'Avoiding Pitfalls / Warnings' in English. |
| google/gemma-3-4b-it | Avoid lightning |
严重
[准确性]
"Avoid lightning"
理由: Literal translation - '避雷' means avoid pitfalls/warnings, not literal lightning | Literal mistranslation of an internet slang term. '避雷' in this context means to avoid bad products, scams, or pitfalls, not actual atmospheric lightning. The hypothesis completely misses the semantic intent. | Does not match the reference meaning of 'Avoiding Pitfalls / Warnings'. | Literal translation misses idiomatic meaning of "避雷" as avoiding pitfalls/bad recommendations or giving warnings. | Overly literal translation. '避雷' in context means avoiding pitfalls or warnings, not literally avoiding lightning. Reference 'Avoiding Pitfalls / Warnings' is more accurate. |
| google/translategemma-12b-it | Lightning protection |
严重
[准确性]
"Lightning protection"
理由: In internet slang, '避雷' means to avoid bad products, scams, or pitfalls (literally 'avoid lightning'). Translating it as 'Lightning protection' interprets the phrase literally regarding physical lightning, missing the idiomatic meaning of 'Avoiding Pitfalls'. | In this context "避雷" is internet slang meaning to avoid pitfalls or warn others away, not literal lightning protection. | Literal but incomplete translation. '避雷' in context means 'avoiding pitfalls/warnings' (figurative), not literal 'lightning protection'. Reference captures the figurative meaning. | Literal translation misses the idiomatic meaning of 避雷 (to avoid problems/pitfalls) in context; reference provides 'Avoiding Pitfalls / Warnings' |
| google/translategemma-4b-it | Search |
致命
[准确性]
"Search"
理由: Complete mistranslation - 'Search' does not convey '避雷' (Avoiding Pitfalls/Warnings) | The translation 'Search' contradicts the source '避雷' (Avoiding pitfalls/warnings). It is likely a UI label error. | Hypothesis 'Search' does not match source '避雷' (Avoiding Pitfalls/Warnings). Complete mistranslation. | '避雷' means 'avoid pitfalls' or 'warnings', not 'Search'. | The translation does not match the source or reference meaning. |
| tencent/HY-MT1.5-1.8B | None |
致命
[准确性]
"None"
理由: The hypothesis does not provide a translation but merely states 'None'. | Missing translation; hypothesis is empty/null when source should be translated | The hypothesis fails to provide any translation for the source text. | No translation provided for '避雷'; expected meaning like 'avoiding pitfalls / warnings'. | Hypothesis is 'None' (no translation provided). Should provide translation like 'Avoiding Pitfalls' or 'Warnings'. |
| tencent/HY-MT1.5-7B | Lightning Protection |
严重
[准确性]
"Lightning Protection"
理由: Mistranslation - literal translation of '避雷' misses the figurative meaning of avoiding pitfalls/warnings | Inaccurate translation. '避雷' in this context means 'avoiding pitfalls' or 'warnings', not literal 'lightning protection'. The reference provides the correct meaning. | The translation does not match the reference and is incorrect. | Translates the literal meaning but misses the figurative internet sense of “避雷” as ‘avoid pitfalls / warnings / flagging bad things’. | Complete mistranslation of context. In internet slang, '避雷' means 'to avoid pitfalls' or 'warn against bad products/experiences'. 'Lightning Protection' is the literal physical meaning, which is incorrect here. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Malaysia | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Malaysia | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Malaysia | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Malaysia | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Malaysia | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Malaysia | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Malaysia | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Malaysia | 通过 (无共识错误) |
| google/gemma-3-12b-it | Malaysia | 通过 (无共识错误) |
| google/gemma-3-1b-it | Malaysia | 通过 (无共识错误) |
| google/gemma-3-4b-it | Import |
致命
[准确性]
"Import"
理由: Complete mistranslation of 'Malaysia'. | Complete hallucination. The source 'Malaysia' is a country name, but the hypothesis translates it as 'Import', which is semantically unrelated. | Completely incorrect translation. 'Malaysia' should remain 'Malaysia', not be translated to 'Import'. | Completely wrong; "Malaysia" is a country name, unrelated to "Import". |
| google/translategemma-12b-it | Malaysia | 通过 (无共识错误) |
| google/translategemma-4b-it | Import |
致命
[准确性]
"Import"
理由: The translation 'Import' is unrelated to the source 'Malaysia', which is a proper noun and should remain unchanged. | Mistranslation; 'Malaysia' is a country name and should be kept as 'Malaysia', not 'Import'. | Complete mistranslation - 'Import' has no relation to 'Malaysia' (proper noun) | The translation does not match the source or reference meaning. | Hypothesis 'Import' does not match source 'Malaysia'. Complete mistranslation. |
| tencent/HY-MT1.5-1.8B | Malaysia | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Malaysia | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | AIStudio | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | AIStudio | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | AIStudio | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | AIStudio | 通过 (无共识错误) |
| Qwen/Qwen3-14B | AIStudio | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | AI Studio |
轻微
[术语]
"AI Studio"
理由: The translation adds a space which may not be necessary. | Hypothesis adds a space in 'AI Studio' while the reference maintains the original branding 'AIStudio' without space. Brand names should be preserved as-is. | Space added to original term; reference keeps it as single word 'AIStudio' | Source and reference use the single-word brand "AIStudio"; inserting a space alters the official name. |
| Qwen/Qwen3.5-9B | AI Studio |
轻微
[术语]
"AI Studio"
理由: Adds a space not present in the source/reference "AIStudio"; for product/brand names this is a minor terminology inconsistency. | The translation adds a space, altering the original term. | Reference keeps as single word 'AIStudio'; hypothesis adds space |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | AI Studio | 通过 (无共识错误) |
| google/gemma-3-12b-it | AIStudio | 通过 (无共识错误) |
| google/gemma-3-1b-it | AI Studio | 通过 (无共识错误) |
| google/gemma-3-4b-it | Done |
致命
[准确性]
"Done"
理由: Complete mistranslation of 'AIStudio'. | Completely incorrect translation. 'AIStudio' is a proper name/product name and should remain 'AIStudio', not be translated to 'Done'. | Complete mistranslation - 'AIStudio' translated as 'Done' with no relation | Completely incorrect; "AIStudio" is a product/name, not related to "Done". | Complete hallucination. The source 'AIStudio' is a proper noun (product name), but the hypothesis translates it as 'Done', which is semantically unrelated. |
| google/translategemma-12b-it | AIStudio | 通过 (无共识错误) |
| google/translategemma-4b-it | AIStudio | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | AIStudio | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | AIStudio | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Novel |
轻微
[准确性]
"Novel"
理由: The source '小说' is a general category often translated as 'Novels' (plural) in UI contexts, whereas 'Novel' (singular) implies a single item, creating a slight mismatch with the reference. | Number mismatch; “小说” in this context aligns with plural “Novels,” but meaning is otherwise correct. | Hypothesis uses singular 'Novel' but reference uses plural 'Novels'. Source '小说' typically refers to the category/genre (plural). |
| CohereLabs/tiny-aya-water | Novel |
轻微
[准确性]
"Novel"
理由: The source '小说' can be singular or plural. The reference uses the plural 'Novels'. Without specific context, the singular 'Novel' is acceptable but slightly less precise if the category implies a collection. | Singular form used while reference uses plural 'Novels' | Hypothesis uses singular 'Novel' but source '小说' typically refers to the category/genre. Reference 'Novels' (plural) is more appropriate. | Singular ‘Novel’ vs. likely category sense ‘Novels’; slight mismatch in number. |
| Qwen/Qwen2.5-14B-Instruct | Novel |
轻微
[准确性]
"Novel"
理由: The source '小说' is a general noun that can be singular or plural. The reference uses the plural 'Novels', which is often more appropriate for category labels. Using the singular 'Novel' is slightly less natural in this context but not a major error. | The translation is singular while the reference is plural ('Novels'). | Singular form used instead of plural. '小说' typically refers to novels in general (plural), and reference uses 'Novels'. | The singular “Novel” is slightly narrower than the general category “小说,” which usually refers to novels in general; plural “Novels” would be more accurate. |
| Qwen/Qwen2.5-7B-Instruct | Novel | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Novel | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Novel |
轻微
[准确性]
"Novel"
理由: Number mismatch. The source '小说' is a general category often implying plural 'Novels' in this context, matching the reference, while 'Novel' implies a singular count. | The translation uses singular while the reference uses plural. | Source is unmarked for number and often used as a category; plural "Novels" is slightly more appropriate here, while singular may underrepresent the intended category meaning. | Hypothesis uses singular 'Novel' while reference uses plural 'Novels'. The Chinese '小说' typically refers to the category/genre, making plural more appropriate. |
| Qwen/Qwen3.5-9B | Novel |
轻微
[准确性]
"Novel"
理由: Number mismatch. '小说' is plural in context, should be 'Novels' to match reference. | Singular form 'Novel' used where the context or reference 'Novels' implies a category or plural sense common for '小说'. | Source 小说 is generic and often plural in this context; singular "Novel" is slightly less accurate than plural "Novels" in the reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Novel |
轻微
[准确性]
"Novel"
理由: Number mismatch; source and reference imply a category/collection (plural) rather than a single item. | The translation uses singular while the reference uses plural. | '小说' is plural (novels), not singular. Should be 'Novels' to match reference. |
| google/gemma-3-12b-it | Novel | 通过 (无共识错误) |
| google/gemma-3-1b-it | 小说 |
严重
[准确性]
"小说"
理由: Failed to translate the source text into English. The hypothesis copies the Chinese source characters. | No translation was performed; the Chinese term remains instead of conveying 'Novels' in the target language. | Source text not translated to target language. Hypothesis should be 'Novels', not the Chinese source itself. |
| google/gemma-3-4b-it | Cart |
致命
[准确性]
"Cart"
理由: Complete mistranslation. The source '小说' means 'Novels' or 'Fiction', while the hypothesis translates it as 'Cart', which is semantically unrelated. | Completely incorrect translation. '小说' means 'Novels', not 'Cart'. | Complete mistranslation - '小说' means novels, not cart | Entirely wrong meaning; "小说" means novels/fiction, unrelated to a shopping cart. | Complete mistranslation of '小说' (Novels). |
| google/translategemma-12b-it | Novel | 通过 (无共识错误) |
| google/translategemma-4b-it | Exit |
致命
[准确性]
"Exit"
理由: The translation 'Exit' is completely unrelated to the source '小说' (Novels/Fiction). | Complete mistranslation - 'Exit' does not convey '小说' (Novels) | Mistranslation; '小说' means 'novels' or 'fiction', not 'Exit'. | Hypothesis 'Exit' does not match source '小说' (Novels). Complete mistranslation. | The translation does not match the source or reference meaning. |
| tencent/HY-MT1.5-1.8B | 小说 |
严重
[准确性]
"小说"
理由: The hypothesis repeats the source text instead of translating it into the target language (English) as implied by the reference. | Source text is simply copied; no translation into English ('novels') is provided. | Hypothesis is in Chinese (source language) but target language is English. Should be 'Novels'. | Hypothesis retains source language instead of translating to target language (zh->en) |
| tencent/HY-MT1.5-7B | novel | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | rambling thoughts | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Whispering thoughts |
严重
[准确性]
"Whispering thoughts"
理由: The source '碎碎念' refers to nagging, rambling, or trivial chatter. 'Whispering thoughts' incorrectly interprets the meaning as quiet or secretive contemplation, missing the connotation of repetitive or trivial speech. | Inaccurate translation - source means 'endless ramblings' or 'continuous muttering', not 'whispering thoughts' | The translation does not match the source meaning. | Does not capture the nuance of “碎碎碎念” which implies incessant or rambling muttering/complaining rather than quiet whispering. | Hypothesis 'Whispering thoughts' misses the meaning of '碎碎碎念'. Reference 'Endless Ramblings' better captures the repetitive, fragmented nature of the phrase. |
| Qwen/Qwen2.5-14B-Instruct | Random Thoughts |
轻微
[准确性]
"Random Thoughts"
理由: Inaccurate translation. '碎碎碎念' emphasizes endless/continuous ramblings, not random thoughts. Reference 'Endless Ramblings' is more accurate. | The translation is less precise than the reference, which is 'Endless Ramblings'. | Captures the idea of miscellaneous musings but loses the repeated “碎碎碎” emphasis of endless/continuous chattering; closer to “endless ramblings.” |
| Qwen/Qwen2.5-7B-Instruct | rambling | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Ranting |
严重
[准确性]
"Ranting"
理由: '碎碎碎念' conveys endless, fragmented ramblings, not just ranting. Reference 'Endless Ramblings' is more accurate. | Captures the idea of talking a lot, but "碎碎碎念" emphasizes continuous, nagging, fragmented muttering; "ranting" adds undue emotional intensity and misses the "碎碎" repetitive nuance. | Translation is oversimplified; misses the repetitive/continuous nature of 碎碎碎念 (endless ramblings) as indicated in reference |
| Qwen/Qwen3.5-4B | Nagging |
严重
[准确性]
"Nagging"
理由: Translation misses the repetitive/rambling aspect of '碎碎碎念' - 'nagging' implies complaining to someone else, not self-rambling or endless muttering | Semantic error. '碎碎念' refers to muttering, rambling, or trivial chatter, whereas 'Nagging' implies persistent annoyance or scolding, which changes the tone significantly. | Hypothesis 'Nagging' misses the meaning of '碎碎碎念' which refers to endless ramblings or fragmented thoughts. 'Endless Ramblings' is more accurate. | The translation is somewhat accurate but not as precise as the reference. | "碎碎碎念" conveys continuous, fragmented rambling; "nagging" focuses on persistent complaining to someone and loses the self-talk/rambling nuance. |
| Qwen/Qwen3.5-9B | Muttering to oneself |
严重
[准确性]
"Muttering to oneself"
理由: Translation misses the repetitive/continuous aspect implied in '碎碎碎念' (endless ranting); reference 'Endless Ramblings' captures this better | Misses the key meaning of '碎碎碎念' which emphasizes endless, rambling thoughts. 'Endless Ramblings' is more accurate. | Conveys internal speech but misses the sense of being long-winded/repetitive in 碎碎碎念, which implies endless rambling chatter. | The translation 'Muttering to oneself' fails to capture the repetitive and often trivial nature of '碎碎念' (endless ramblings/nagging), losing the specific connotation of the reduplicated characters. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Scattered thoughts | 通过 (无共识错误) |
| google/gemma-3-12b-it | Rambling thoughts | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '碎碎碎念' in en |
致命
[准确性]
"Translation of '碎碎碎念' in en"
理由: Meta-text indicating translation rather than providing the translation itself; completely fails to convey the source meaning. | Placeholder text instead of actual translation | Incorrect translation; should be 'Endless Ramblings'. | Hypothesis is a meta-description, not an actual translation. Should be 'Endless Ramblings' or similar. | The hypothesis is a meta-comment describing the task rather than performing the translation. It fails to provide the actual translation. |
| google/gemma-3-4b-it | Random thoughts | 通过 (无共识错误) |
| google/translategemma-12b-it | Ramblings |
轻微
[准确性]
"Ramblings"
理由: Missing intensity/repetition aspect. '碎碎碎念' emphasizes endless/continuous ramblings through repetition. Reference 'Endless Ramblings' better captures this. | "碎碎碎念" intensifies the idea of continuous or endless rambling; "Ramblings" misses the repetitive / incessant nuance. | Translation is missing the repetitive/emphatic nuance of 碎碎碎念; reference 'Endless Ramblings' captures this better |
| google/translategemma-4b-it | Random thoughts |
轻微
[准确性]
"Random thoughts"
理由: Partially accurate but misses the sense of being repetitive/constant implied by '碎碎碎念' (endless ramblings or nagging). | The translation is somewhat close but not fully accurate. | Partial translation - 'Random thoughts' is close but reference suggests 'Endless Ramblings' as more accurate | Hypothesis 'Random thoughts' partially captures meaning but differs from reference 'Endless Ramblings'. Less accurate translation of '碎碎碎念'. |
| tencent/HY-MT1.5-1.8B | Translation of '碎碎碎念' in en |
致命
[准确性]
"Translation of '碎碎碎念' in en"
理由: The hypothesis does not provide a translation but merely states the intent to translate. | The hypothesis is a meta-commentary describing the task rather than providing the actual translation of the source text. | Hypothesis is a meta-description rather than an actual translation. Should provide 'Endless Ramblings' or equivalent. | Hypothesis is a placeholder/template rather than an actual translation | Only a meta-statement; does not convey the intended meaning 'endless ramblings'. |
| tencent/HY-MT1.5-7B | "en": "Chatter on and on", "zh-TW": "不停唠叨", "ja": "没完没了地闲聊", "ko": "不停地碎碎念", "de": "Endless chatter", "fr": "Babbling without stop", "it":="Continual mumbling" }|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | FAQ |
致命
[准确性]
"FAQ"
理由: Complete mistranslation - source means 'Financial News' but hypothesis is 'FAQ' with no relation | The source text '金融消息' means 'Financial News', but the hypothesis translates it as 'FAQ', which is a complete mistranslation of the meaning. | Hypothesis is completely wrong. Should translate '金融消息' (Financial News) not 'FAQ'. | Wrong concept; “金融消息” means “Financial News,” not “FAQ.” |
| CohereLabs/tiny-aya-water | Financial news | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Financial Messages |
严重
[准确性]
"Financial Messages"
理由: The translation is less common than the reference, which is 'Financial News'. | Incorrect word choice. '消息' in this context means 'news', not 'messages'. Reference 'Financial News' is correct. | “金融消息” in this context refers to financial news/information reports; “messages” suggests direct communications rather than news. |
| Qwen/Qwen2.5-7B-Instruct | Financial News | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Financial News | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Financial News | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Financial News | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Financial News | 通过 (无共识错误) |
| google/gemma-3-12b-it | Financial News | 通过 (无共识错误) |
| google/gemma-3-1b-it | Financial news | 通过 (无共识错误) |
| google/gemma-3-4b-it | Financial News | 通过 (无共识错误) |
| google/translategemma-12b-it | Financial News | 通过 (无共识错误) |
| google/translategemma-4b-it | Promotion |
致命
[准确性]
"Promotion"
理由: Hypothesis 'Promotion' does not match source '金融消息' (Financial News). Complete mistranslation. | Complete mistranslation - 'Promotion' does not convey '金融消息' (Financial News) | The translation 'Promotion' is incorrect; the source '金融消息' means 'Financial News'. | The translation does not match the source or reference meaning. | Mistranslation; '金融消息' means 'financial news', not 'promotion'. |
| tencent/HY-MT1.5-1.8B | 金融消息的翻译 |
致命
[准确性]
"金融消息的翻译"
理由: The hypothesis does not provide a translation but merely states the intent to translate. | Hypothesis is a placeholder/template rather than an actual translation; adds Chinese text | Chinese meta-text 'translation of financial news' instead of actually translating to 'financial news'. | The hypothesis includes unnecessary explanatory text ('的翻译') instead of providing the direct translation. | Hypothesis is a meta-description in Chinese rather than the actual translation. Should be 'Financial News'. |
| tencent/HY-MT1.5-7B | Financial News | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Otter script |
致命
[准确性]
"Otter script"
理由: Severe mistranslation; “油猴脚本” refers to Tampermonkey scripts, not anything related to otters. | The source '油猴脚本' refers to 'Tampermonkey Scripts' (a specific browser extension). 'Otter script' is a hallucination or incorrect translation that loses the specific entity reference. | Hypothesis is incorrect. '油猴脚本' means 'Tampermonkey Scripts' not 'Otter script'. Misidentified the term. |
| CohereLabs/tiny-aya-water | Oil Monkey Script |
严重
[术语]
"Oil Monkey Script"
理由: Incorrect term for well-known browser extension; should be ‘Tampermonkey script(s)’ or at least ‘Tampermonkey’ transliteration, not literal “Oil Monkey”. | The translation should be 'Tampermonkey Scripts'. | Wrong translation - '油猴' is the Chinese brand name for Tampermonkey, should be 'Tampermonkey Scripts' |
| Qwen/Qwen2.5-14B-Instruct | Tampermonkey Script |
轻微
[准确性]
"Tampermonkey Script"
理由: Source is a generic category; singular “Script” is slightly less accurate than plural “Scripts” as in the reference, though meaning is clear. | The translation is singular while the reference is plural ('Tampermonkey Scripts'). | Singular form used instead of plural. '油猴脚本' refers to multiple scripts, and reference uses 'Tampermonkey Scripts'. |
| Qwen/Qwen2.5-7B-Instruct | GreaseMonkey script |
严重
[术语]
"GreaseMonkey script"
理由: Hypothesis uses 'GreaseMonkey' but reference uses 'Tampermonkey'. While both are script managers, '油猴脚本' specifically refers to Tampermonkey in modern Chinese contexts. | While 'GreaseMonkey' is the original extension, 'Tampermonkey' is the more common modern equivalent referenced. However, the bigger issue is that '油猴' is the specific Chinese nickname for these scripts; 'GreaseMonkey' is an acceptable equivalent, but the reference prefers 'Tampermonkey'. This is a minor terminology preference. | The translation uses a different term than the reference. | Uses a related but different userscript manager name; “油猴脚本” is often associated with Tampermonkey or generic userscripts, so brand choice may be slightly off. | Wrong tool name - should be 'Tampermonkey' not 'GreaseMonkey' |
| Qwen/Qwen3-14B | Tampermonkey Script | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | UserScript |
严重
[准确性]
"UserScript"
理由: Hypothesis uses generic 'UserScript' while '油猴脚本' specifically refers to Tampermonkey scripts. The reference is more accurate and specific. | Terminology error. '油猴' specifically refers to 'Tampermonkey'. While UserScript is the underlying technology, the specific term 'Tampermonkey Scripts' is the accurate equivalent for the colloquial '油猴脚本'. | The translation is somewhat accurate but not as precise as the reference. |
| Qwen/Qwen3.5-9B | Tampermonkey script | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Oil Monkey Script |
轻微
[准确性]
"Oil Monkey Script"
理由: Literal translation is incorrect. '油猴脚本' refers to Tampermonkey scripts (a browser extension), not literal oil monkey scripts. | The source '油猴脚本' refers to scripts for the browser extension 'Tampermonkey' (known as 'Youhou' in Chinese). Translating it literally as 'Oil Monkey' is incorrect; the established English term is 'Tampermonkey Scripts'. | The translation incorrectly interprets '油猴脚本' as 'Oil Monkey Script' instead of 'Tampermonkey Scripts'. | Mistranslation - '油猴' is the Chinese name for 'Tampermonkey', a browser extension, not literal oil monkey |
| google/gemma-3-12b-it | Tampermonkey script | 通过 (无共识错误) |
| google/gemma-3-1b-it | Oil Monkey Script |
严重
[术语]
"Oil Monkey Script"
理由: Incorrect terminology; should be Tampermonkey (a brand name), not literal translation | Literal rendering of the Chinese nickname; standard term for the browser extension is 'Tampermonkey script', so this is misleading terminology. | Incorrect translation; should be 'Tampermonkey Scripts'. |
| google/gemma-3-4b-it | UserScript |
严重
[准确性]
"UserScript"
理由: Partially inaccurate. '油猴脚本' specifically refers to Tampermonkey scripts, not generic UserScripts. Reference 'Tampermonkey Scripts' is more precise. | Imprecise terminology. '油猴' specifically refers to the Tampermonkey extension. While Tampermonkey uses UserScripts, translating '油猴脚本' simply as 'UserScript' loses the specific brand reference intended by the source. 'Tampermonkey Scripts' is more accurate. | Does not fully capture the specificity of 'Tampermonkey Scripts'. |
| google/translategemma-12b-it | Tampermonkey script | 通过 (无共识错误) |
| google/translategemma-4b-it | Resource |
轻微
[准确性]
"Resource"
理由: The translation does not match the source or reference meaning. | The translation 'Resource' is too generic. The source '油猴脚本' specifically refers to 'Tampermonkey Scripts'. | Mistranslation; '油猴脚本' specifically refers to Tampermonkey scripts or userscripts, not generic 'resource'. | Hypothesis 'Resource' does not match source '油猴脚本' (Tampermonkey Scripts). Complete mistranslation. |
| tencent/HY-MT1.5-1.8B | Oil Monkey Script |
严重
[术语]
"Oil Monkey Script"
理由: Incorrect terminology; '油猴' is the Chinese name for Tampermonkey, should be 'Tampermonkey Scripts' | Literal translation of '油猴' is incorrect; established term is 'Tampermonkey Scripts'. | The translation 'Oil Monkey Script' does not match the reference 'Tampermonkey Scripts'. | The term '油猴脚本' refers to the specific software 'Tampermonkey'; translating it literally as 'Oil Monkey' is incorrect terminology. |
| tencent/HY-MT1.5-7B | Oil Monkey Script |
严重
[准确性]
"Oil Monkey Script"
理由: Mistranslation - '油猴' is a nickname for Tampermonkey, not literal oil monkey | Incorrect translation. '油猴脚本' refers to 'Tampermonkey Scripts' (a browser extension), not 'Oil Monkey Script'. This is a mistranslation of the proper noun. | Literal mistranslation of a proper noun. '油猴' refers to the browser extension 'Tampermonkey', not an 'Oil Monkey'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Daily | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Daily |
轻微
[准确性]
"Daily"
理由: Hypothesis 'Daily' is incomplete. Reference 'Daily Life' better captures the full meaning of '日常'. | The translation should be 'Daily Life'. | Too vague; “日常” usually refers to ‘daily life’ or ‘everyday stuff’, not just ‘daily’ as an adjective. | Incomplete translation - reference includes 'Life' making it 'Daily Life', hypothesis only has 'Daily' |
| Qwen/Qwen2.5-14B-Instruct | Daily |
轻微
[准确性]
"Daily"
理由: The translation is less precise than the reference, which is 'Daily Life'. | Incomplete translation. '日常' means 'daily life' or 'everyday', not just 'daily'. Reference 'Daily Life' is more complete. | “日常” in this context typically implies “daily life/everyday matters”; “Daily” alone is vague and less specific. |
| Qwen/Qwen2.5-7B-Instruct | Daily |
轻微
[准确性]
"Daily"
理由: Incomplete translation - reference includes 'Life' | Hypothesis is incomplete. '日常' means 'daily life' or 'everyday', not just 'Daily'. Reference 'Daily Life' is more complete. | The translation is too brief compared to the reference. | “日常” often conveys “daily life” or “everyday stuff”; “Daily” alone is underspecified but understandable. |
| Qwen/Qwen3-14B | Daily |
轻微
[准确性]
"Daily"
理由: Incomplete translation. '日常' as a noun phrase usually refers to 'Daily Life' or 'Routine', whereas 'Daily' is primarily an adjective. | Incomplete translation; missing 'life' component indicated in reference 'Daily Life' | "日常" in this context usually means "daily life / everyday stuff"; bare adjective "Daily" is vague and incomplete as a standalone noun phrase. | '日常' refers to daily life/routine, not just 'Daily'. Reference 'Daily Life' is more complete. |
| Qwen/Qwen3.5-4B | Daily | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Daily |
轻微
[准确性]
"Daily"
理由: "Daily" suggests frequency or a category, while 日常 here more naturally means "daily life" or "everyday"; semantic nuance is slightly off. | The translation 'Daily' is an adjective, whereas '日常' in this context functions as a noun phrase meaning 'Daily Life' or 'Routine', as indicated by the reference. | Incomplete translation. '日常' means 'daily life' or 'everyday', not just 'daily'. Reference 'Daily Life' is more complete. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Everyday life | 通过 (无共识错误) |
| google/gemma-3-12b-it | Daily |
轻微
[准确性]
"Daily"
理由: “日常” usually refers to ‘daily life / everyday stuff’; “Daily” alone is vague and lacks the ‘life/everyday’ nuance. | While 'Daily' is an adjective, the source '日常' as a noun phrase often implies 'Daily Life' or 'Routine'. 'Daily' alone feels incomplete as a noun equivalent in this context. | Incomplete translation. '日常' means 'daily life' or 'everyday', not just 'Daily'. Reference provides fuller meaning. | The translation lacks the context provided in the reference. |
| google/gemma-3-1b-it | Daily |
轻微
[准确性]
"Daily"
理由: Translation is incomplete. Reference 'Daily Life' is more complete and accurate than just 'Daily'. | Conveys frequency but omits the implied aspect of life/routine present in 'Daily Life'; meaning is slightly underspecified. | Incomplete translation; reference includes 'Life' | The translation is slightly incomplete. '日常' in this context often implies 'Daily Life' or 'Routine', as captured in the reference. 'Daily' alone feels like an adjective missing a noun. |
| google/gemma-3-4b-it | View |
致命
[准确性]
"View"
理由: Complete mistranslation. The source '日常' means 'Daily Life' or 'Routine', while the hypothesis translates it as 'View', which is semantically unrelated. | Incorrect meaning; "日常" refers to daily life/routine, not a view. | Completely incorrect translation. '日常' means 'Daily Life', not 'View'. | Complete mistranslation - '日常' means daily life, not view | Does not match the reference meaning of 'Daily Life'. |
| google/translategemma-12b-it | Daily |
轻微
[准确性]
"Daily"
理由: '日常' as a noun often refers to 'Daily Life' or 'Routine'. Translating it as the adjective 'Daily' loses the nominal sense present in the source and the reference. | Reference 'Daily Life' is more complete; hypothesis drops 'life' component | Incomplete translation. '日常' means 'daily life' or 'everyday', not just 'daily'. Reference 'Daily Life' is more complete. | "日常" in this context usually refers to everyday life or routine; "Daily" is vague and under-translates the sense of daily life / everyday matters. | The translation is accurate but could be more precise by including 'Life' as in the reference. |
| google/translategemma-4b-it | Daily |
轻微
[准确性]
"Daily"
理由: Hypothesis 'Daily' is incomplete compared to reference 'Daily Life'. Missing 'Life' component. | The translation is somewhat close but not fully accurate. | Incomplete translation - 'Daily' is partial; reference suggests 'Daily Life' | Translation is somewhat incomplete; '日常' in this context is closer to 'daily life' or 'everyday stuff' than the bare adjective 'Daily'. |
| tencent/HY-MT1.5-1.8B | 日常 |
严重
[准确性]
"日常"
理由: The hypothesis repeats the source text instead of translating it into the target language (English) as implied by the reference. | Source text repeated with no translation into English ('daily life'). | Hypothesis retains source language instead of translating to target language (zh->en) | Hypothesis is in Chinese (source language) but target language is English. Should be 'Daily Life'. |
| tencent/HY-MT1.5-7B | Daily routine |
轻微
[准确性]
"Daily routine"
理由: '日常' in this context usually refers to 'Daily Life' or general everyday occurrences. 'Daily routine' implies a specific schedule of tasks, which is slightly narrower than the source. | Slightly inaccurate. '日常' means 'daily life' more broadly, not specifically 'daily routine'. Reference 'Daily Life' is more accurate. | The translation is correct but less precise than the reference. | “日常” here is closer to ‘daily life/everyday things’; “daily routine” narrows the meaning to habits/schedule. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Translate this sentence to French |
致命
[准确性]
"Translate this sentence to French"
理由: Complete mistranslation - source means 'Legendary/Elite Posts' but hypothesis is unrelated instruction to translate to French | Unrelated instruction; does not convey “精华神帖” (legendary/elite posts). | The hypothesis contains an instruction prompt instead of translating the source text '精华神帖' (Legendary/Elite Posts). This is a complete failure to translate. | Hypothesis is a meta-instruction, completely unrelated to source text. Should translate '精华神帖' (Legendary/Elite Posts). |
| CohereLabs/tiny-aya-water | Essential Gold Post |
严重
[准确性]
"Essential Gold Post"
理由: Inaccurate translation - source means 'legendary/elite posts', not 'essential gold post' | Hypothesis 'Essential Gold Post' is inaccurate. '精华神帖' refers to elite/legendary posts. Reference 'Legendary / Elite Posts' is more appropriate. | The translation does not match the source meaning. | The translation is overly literal. '精华' implies 'featured' or 'elite', and '神帖' implies 'legendary' or 'god-tier' posts. 'Essential Gold Post' fails to capture the community slang nuance of high-quality, legendary content. |
| Qwen/Qwen2.5-14B-Instruct | Essence Divine Post |
严重
[准确性]
"Essence Divine Post"
理由: The translation is less precise than the reference, which is 'Legendary / Elite Posts'. | Fails to convey the forum-specific meaning of精选/传奇高质量帖子; should be something like “legendary/elite posts.” | Literal translation not matching reference meaning 'Legendary / Elite Posts' | This is a literal and awkward translation. '精华' means 'essence' or 'highlighted', and '神帖' is slang for 'god-tier' or 'legendary' posts. 'Divine' is an incorrect register for internet slang, and the phrase 'Essence Divine Post' is unnatural. 'Legendary / Elite Posts' captures the meaning and tone correctly. | Awkward and inaccurate translation. '精华神帖' refers to legendary or elite posts, not 'essence divine post'. Reference is significantly more accurate. |
| Qwen/Qwen2.5-7B-Instruct | Essence Divine帖 |
严重
[流畅性]
"Essence Divine帖"
理由: Hypothesis contains mixed language (English and Chinese characters). '帖' should be translated as 'Posts', not left in Chinese. Should be 'Essence Divine Posts' or similar. | The translation is awkward and unclear. | Mixes English with an untranslated Chinese character “帖” and unnatural phrase “Essence Divine”, resulting in ungrammatical, unclear output.
严重
[准确性]
"Essence Divine帖"
理由: The hypothesis fails to translate the character '帖' (post), leaving it in Chinese. Additionally, 'Essence Divine' is an awkward literal translation of '精华神', whereas 'Legendary / Elite Posts' captures the intended meaning. | Fails to convey the notion of “legendary / elite posts”; “Essence Divine” does not match the intended meaning. | Untranslated character '帖' and awkward compound 'Essence Divine' |
| Qwen/Qwen3-14B | Essential Posts |
轻微
[准确性]
"Essential Posts"
理由: Conveys importance but misses the colloquial nuance of "神帖" as legendary, standout, or exceptionally good posts; tone is more neutral and less emphatic. | '精华神帖' refers to legendary/elite posts, not just essential. Reference 'Legendary / Elite Posts' is more accurate. | Translation does not capture the superlative/legendary connotation of 精华神帖 as indicated in reference 'Legendary / Elite Posts' | Under-translation of intensity. '神帖' implies 'god-tier' or 'legendary' posts, which is stronger than just 'Essential'. |
| Qwen/Qwen3.5-4B | Essential Post |
轻微
[准确性]
"Essential Post"
理由: Hypothesis 'Essential Post' is too generic. '精华神帖' refers to legendary or elite posts, capturing the sense of exceptional quality and prestige better than 'Essential'. | Translation captures 'essential' but misses the legendary/elite connotation in reference | The translation is somewhat accurate but not as precise as the reference. | "精华神帖" implies highly celebrated, legendary-quality posts; "essential post" sounds merely important or curated and misses the strong praise and community-meme connotation. |
| Qwen/Qwen3.5-9B | Essential Posts |
严重
[准确性]
"Essential Posts"
理由: Misses the connotation of '精华神帖' which emphasizes legendary/elite quality. Reference 'Legendary / Elite Posts' better captures the meaning. | Translation does not capture the connotation of excellence/legendary status in '精华神帖'; reference 'Legendary / Elite Posts' is more accurate | Meaning is close but 精华神帖 emphasizes extraordinarily good/legendary quality; "Essential" underplays the hyperbolic, standout nature. | The term '精华' implies high quality or curated content ('Elite'/'Legendary'), and '神帖' implies legendary status. 'Essential' understates the hyperbolic praise inherent in '神' (god/divine). |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Essential Divine Posts |
严重
[准确性]
"Essential Divine Posts"
理由: The translation is overly literal. '神帖' in internet slang means 'legendary', 'amazing', or 'elite' posts, not literally 'divine'. 'Essential Divine Posts' sounds unnatural and misrepresents the tone. | Inaccurate translation. '精华神帖' refers to legendary or elite posts, not 'essential divine' posts. Misses the cultural connotation. | Over-translation - '精华神帖' means legendary/elite posts, not essential divine posts |
| google/gemma-3-12b-it | Featured Posts |
轻微
[准确性]
"Featured Posts"
理由: Translation misses the superlative/legendary connotation of '精华神帖' (elite/legendary top posts) | Conveys high-quality posts but misses the stronger praise in “精华神帖” which implies legendary or exceptionally classic posts, not just ‘featured’. | Incorrect meaning. '精华神帖' refers to legendary or elite posts, not just 'featured posts'. '神帖' specifically means legendary/divine posts. | The translation is more generic than the reference. | '精华' implies 'essence' or 'elite/high-quality', and '神帖' implies 'legendary/god-tier posts'. 'Featured Posts' understates the intensity and community reverence implied by '神' (god/legendary). |
| google/gemma-3-1b-it | Elegant Manuscript |
严重
[准确性]
"Elegant Manuscript"
理由: Severe mistranslation. '精华' means essence/elite/highlight, not 'Elegant'. '神帖' means legendary/god-tier post, not 'Manuscript'. The meaning is completely lost. | Incorrect translation; reference indicates legendary/elite posts | Translation is inaccurate. '精华神帖' means 'Legendary/Elite Posts', not 'Elegant Manuscript'. | Translation does not match the reference. | Misinterprets the phrase; refers to important/legendary forum posts rather than elegance or manuscripts. |
| google/gemma-3-4b-it | Featured Post |
轻微
[准确性]
"Featured Post"
理由: Inaccurate translation. '精华神帖' refers to legendary or elite posts, not just featured posts. Reference 'Legendary / Elite Posts' is more accurate. | Conveys prominence but under-translates intensity of "精华神帖" which implies legendary/exceptional quality, not just featured. | Under-translation of intensity. '精华' means featured/essential, but '神帖' elevates it to 'god-tier' or 'legendary'. 'Featured Post' captures the first part but misses the hyperbolic praise of '神' (God/Legendary) present in the source and reference. |
| google/translategemma-12b-it | Featured Posts |
轻微
[准确性]
"Featured Posts"
理由: Translation does not capture the elite/legendary connotation of 精华神帖; reference 'Legendary / Elite Posts' is more accurate | Inaccurate translation. '精华神帖' refers to legendary/elite posts, not just featured posts. Reference 'Legendary / Elite Posts' is more accurate. | The translation is correct but could be more precise by including 'Legendary / Elite' as in the reference. | '精华' means 'essence' or 'elite', and '神帖' implies 'legendary' or 'god-tier' posts. 'Featured Posts' is a standard forum term but understates the intensity of '神' (god/legendary) compared to 'Legendary / Elite Posts'. | "精华神帖" implies exceptionally high-quality, legendary or epic posts; "Featured Posts" is weaker and loses the "legendary" / "god-tier" emphasis. |
| google/translategemma-4b-it | View |
致命
[准确性]
"View"
理由: Complete mistranslation - 'View' does not convey '精华神帖' (Legendary/Elite Posts) | Mistranslation; '精华神帖' refers to 'legendary/elite posts', not 'View'. | Hypothesis 'View' does not match source '精华神帖' (Legendary/Elite Posts). Complete mistranslation. | The translation 'View' is a UI action, whereas the source '精华神帖' refers to specific content ('Legendary/Elite Posts'). | The translation does not match the source or reference meaning. |
| tencent/HY-MT1.5-1.8B | Essence Divine Post | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Essential and impressive posts | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Hugging Face | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Hugging Face | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | HuggingFace | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | HuggingFace | 通过 (无共识错误) |
| Qwen/Qwen3-14B | HuggingFace | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | HuggingFace | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | HuggingFace | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Hugging Face |
轻微
[术语]
"Hugging Face"
理由: Product/brand name is conventionally written as a single word "HuggingFace" in this context. | Unnecessary expansion - 'HuggingFace' is a brand name that should remain as-is | Added space breaks the proper noun. Should be 'HuggingFace' as a single brand name. |
| google/gemma-3-12b-it | HuggingFace | 通过 (无共识错误) |
| google/gemma-3-1b-it | Hugging Face | 通过 (无共识错误) |
| google/gemma-3-4b-it | Hugging Face | 通过 (无共识错误) |
| google/translategemma-12b-it | HuggingFace | 通过 (无共识错误) |
| google/translategemma-4b-it | Hugging Face |
轻微
[风格]
"Hugging Face"
理由: Style inconsistency - reference keeps compound form 'HuggingFace' without space | Hypothesis 'Hugging Face' has spacing while reference 'HuggingFace' is camelCase. Minor formatting difference. | The hypothesis adds a space ('Hugging Face') while the source and reference use the concatenated brand name 'HuggingFace'. While readable, it deviates from the specific branding in the source. |
| tencent/HY-MT1.5-1.8B | Hugging Face |
轻微
[术语]
"Hugging Face"
理由: The official brand name is typically stylized as 'HuggingFace' (one word) as shown in the source and reference, though the two-word version is understandable. | Spacing differs from reference brand form 'HuggingFace'; could be considered a minor terminology inconsistency. | Hypothesis uses space in 'Hugging Face' but reference uses no space 'HuggingFace'. The proper brand name is 'HuggingFace'. |
| tencent/HY-MT1.5-7B | HuggingFace | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | copaw | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | copaw | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | copaw | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | copaw | 通过 (无共识错误) |
| Qwen/Qwen3-14B | copaw | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Copaw |
轻微
[风格]
"Copaw"
理由: Hypothesis capitalizes 'Copaw' while reference maintains lowercase 'copaw'. Brand/product names should preserve original capitalization. | Capitalization difference - reference uses lowercase 'copaw' while hypothesis capitalizes first letter | Capitalization inconsistency. The hypothesis uses Title Case 'Copaw' while the reference preserves the source's lowercase 'copaw'. |
| Qwen/Qwen3.5-9B | copaw | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Sign up |
致命
[准确性]
"Sign up"
理由: The translation incorrectly interprets 'copaw' as 'Sign up'. | Completely incorrect translation. 'copaw' is a proper noun/brand name and should not be translated to 'Sign up'. | The source 'copaw' appears to be a proper noun (likely a project name or typo for 'copilot'). The translation 'Sign up' is a complete hallucination with no semantic link to the source text. | Complete mistranslation; replaces the source string with unrelated content. | Mistranslation - 'copaw' is a proper noun that should remain as-is, not translated as 'Sign up' |
| google/gemma-3-12b-it | copaw | 通过 (无共识错误) |
| google/gemma-3-1b-it | copaw | 通过 (无共识错误) |
| google/gemma-3-4b-it | copaw | 通过 (无共识错误) |
| google/translategemma-12b-it | copaw | 通过 (无共识错误) |
| google/translategemma-4b-it | copaw | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 登录 |
致命
[准确性]
"登录"
理由: The translation '登录' (Login) is completely unrelated to the source 'copaw', which should be retained as a proper noun. | Complete mistranslation; source is copaw (proper noun), hypothesis translates to 'login' which is completely unrelated | Incorrect translation; '登录' ('log in') is unrelated to the name 'copaw'. | Hypothesis '登录' (login) is completely incorrect. Should be 'copaw' - the source identifier should be preserved. |
| tencent/HY-MT1.5-7B | Copaw |
轻微
[风格]
"Copaw"
理由: Capitalization inconsistency - reference uses lowercase 'copaw' while hypothesis uses 'Copaw' | Capitalization differs. Hypothesis uses 'Copaw' while reference uses lowercase 'copaw'. | Inconsistent capitalization compared to the reference 'copaw'. Proper nouns should match the official branding casing. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | bar | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | tavern |
轻微
[风格]
"tavern"
理由: Lowercase form is slightly inconsistent with reference capitalization; meaning is correct. | Hypothesis uses lowercase 'tavern' but reference uses capitalized 'Tavern' which is more appropriate for a title/category. | As a standalone label or title corresponding to the source '酒馆', the word should be capitalized ('Tavern') to match standard title casing conventions. | Case difference - reference uses capitalized 'Tavern' |
| Qwen/Qwen2.5-14B-Instruct | Tavern | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Tavern | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Tavern | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Tavern | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Tavern | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Bar | 通过 (无共识错误) |
| google/gemma-3-12b-it | Tavern | 通过 (无共识错误) |
| google/gemma-3-1b-it | Bar | 通过 (无共识错误) |
| google/gemma-3-4b-it | Buy Now |
致命
[准确性]
"Buy Now"
理由: Complete mistranslation - '酒馆' means tavern, not buy now | Completely incorrect translation. '酒馆' means 'Tavern', not 'Buy Now'. | Totally incorrect; "酒馆" means tavern/pub, unrelated to a purchase call-to-action. | Complete mistranslation of '酒馆' (Tavern). | Complete mistranslation. The source '酒馆' means 'Tavern' or 'Pub', while the hypothesis translates it as 'Buy Now', which is semantically unrelated. |
| google/translategemma-12b-it | Tavern | 通过 (无共识错误) |
| google/translategemma-4b-it | Shop |
严重
[准确性]
"Shop"
理由: The translation 'Shop' is inaccurate for '酒馆' (Tavern/Inn), which implies a social gathering place rather than a retail store. | Incorrect translation - 'Shop' does not convey '酒馆' (Tavern) | The translation is somewhat close but not fully accurate. | Hypothesis 'Shop' does not match source '酒馆' (Tavern). Incorrect translation of the specific venue type. | Partially related but inaccurate; '酒馆' is specifically a 'tavern' or 'pub', not a generic 'shop'. |
| tencent/HY-MT1.5-1.8B | 酒馆 |
严重
[准确性]
"酒馆"
理由: The hypothesis repeats the source text instead of translating it into the target language (English) as implied by the reference. | Hypothesis is in Chinese (source language) but target language is English. Should be 'Tavern'. | Source text copied; no translation into English ('tavern') provided. | Hypothesis retains source language instead of translating to target language (zh->en) |
| tencent/HY-MT1.5-7B | tavern |
轻微
[风格]
"tavern"
理由: Capitalization differs. Hypothesis uses lowercase 'tavern' while reference uses capitalized 'Tavern'. | Inconsistent capitalization. As a UI label or proper noun equivalent, it should likely be capitalized 'Tavern' to match the reference and standard title casing. | Capitalization - reference uses capitalized 'Tavern' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | VR | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | VR | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | VR | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | VR | 通过 (无共识错误) |
| Qwen/Qwen3-14B | VR | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | VR | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | VR | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Virtual Reality | 通过 (无共识错误) |
| google/gemma-3-12b-it | VR | 通过 (无共识错误) |
| google/gemma-3-1b-it | VR | 通过 (无共识错误) |
| google/gemma-3-4b-it | Submit |
致命
[准确性]
"Submit"
理由: Completely wrong; "VR" is a virtual reality acronym, unrelated to submitting. | Complete mistranslation of 'VR'. | Complete mistranslation - 'VR' translated as 'Submit' with no relation | Complete hallucination. The source 'VR' is an acronym for Virtual Reality, while the hypothesis translates it as 'Submit', which is semantically unrelated. | Completely incorrect translation. 'VR' is an acronym for Virtual Reality and should remain 'VR', not be translated to 'Submit'. |
| google/translategemma-12b-it | VR | 通过 (无共识错误) |
| google/translategemma-4b-it | VR | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | VR | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | VR | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Discord | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Discord | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Discord | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Discord | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Discord | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Discord | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Discord | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Discord | 通过 (无共识错误) |
| google/gemma-3-12b-it | Discord | 通过 (无共识错误) |
| google/gemma-3-1b-it | Discord | 通过 (无共识错误) |
| google/gemma-3-4b-it | Unblocked |
致命
[准确性]
"Unblocked"
理由: Complete hallucination. The source 'Discord' is a proper noun (platform name), while the hypothesis translates it as 'Unblocked', which is semantically unrelated. | Complete mistranslation - 'Discord' translated as 'Unblocked' with no relation | Incorrect; "Discord" is a platform name and not semantically related to "Unblocked". | Complete mistranslation of 'Discord'. | Completely incorrect translation. 'Discord' is a proper name (communication platform) and should remain 'Discord', not be translated to 'Unblocked'. |
| google/translategemma-12b-it | Discord | 通过 (无共识错误) |
| google/translategemma-4b-it | Viber |
致命
[准确性]
"Viber"
理由: Hypothesis 'Viber' does not match source 'Discord'. Complete mistranslation - wrong platform name. | Wrong entity - 'Viber' is incorrect; source and reference both indicate 'Discord' | The translation replaces the specific proper noun 'Discord' with a different app name 'Viber', which is factually incorrect. | The translation does not match the source or reference meaning. | Wrong application name; should remain 'Discord' as in the source and reference. |
| tencent/HY-MT1.5-1.8B | Discord | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Discord | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The Forum Admin/Founder/Neo has posted something, and it's a front-row chat (≧∇≦) |
严重
[准确性]
"The Forum Admin/Founder/Neo"
理由: Mistranslation: '始皇' refers to a forum admin/founder nickname, not literal 'Forum Admin/Founder/Neo' | Unclear translation of '始皇'. Context suggests this is a username/nickname, not necessarily 'Forum Admin/Founder/Neo'. | “始皇”在此是对原帖楼主/发帖人的戏称,非“论坛管理员/创始人/Neo” | Mistranslation of '始皇' (Qin Shi Huang), a specific meme reference for a user, into unrelated titles.
严重
[准确性]
"it's a front-row chat"
理由: “前排沙发”指“抢到沙发/占前排”,不是“前排聊天” | Mistranslation: '前排沙发' is internet slang for 'first to reply' or 'front row seat', not 'chat' | The term '沙发' should be translated as 'seat' rather than 'chat'. | Mistranslation of '沙发' (sofa), which is internet slang for the first comment/post, not a chat. |
| CohereLabs/tiny-aya-water | The Forum Admin has posted something, front row of the chat (≧∇≦)ノ. |
严重
[准确性]
"The Forum Admin has posted something"
理由: “始皇”在语境中是对原帖楼主/大佬的戏称,不是论坛管理员,且“发帖了”表现的是抢楼场景,语义被误解。 | Misinterpretation: 始皇 refers to a user nickname, not 'Forum Admin' | Mistranslation of '始皇' (Shi Huang/First Emperor), a specific community meme referring to a user or entity, not a generic forum admin.
轻微
[准确性]
"front row of the chat"
理由: “前排沙发”是“抢沙发/占到一楼”的论坛用语,比一般的“front row of the chat”更具体,语义略有偏离。 | '前排沙发' is internet slang for being the first to reply/comment on a post, not physically sitting in a chat room. | 沙发 (sofa) is internet slang for 'first reply/post', not 'front row of the chat'. Should be 'sofa/first post'. |
| Qwen/Qwen2.5-14B-Instruct | The Admin posted, front row sofa (≧∇≦)ノ。 |
严重
[准确性]
"The Admin"
理由: “始皇”在论坛语境通常指“版主/楼主/管理员中的特定昵称/站长”,直译为“The Admin”过于具体且不一定指一般管理员,存在误指风险。 | Incorrect translation of 始皇 - should be 'Qin Shi Huang' or 'the admin/owner' in context, not literal 'Admin' | 始皇 refers to Qin Shi Huang (historical figure), not 'Admin'. Context suggests a forum username, not a literal translation. | '始皇' is a specific humorous nickname for a user, translating it as 'The Admin' changes the specific reference to a generic role. |
| Qwen/Qwen2.5-7B-Instruct | Neo posted, front row sofa (≧∇≦)ノ. |
严重
[准确性]
"Neo"
理由: “始皇”是秦始皇,指原帖楼主或管理员的昵称,不是Neo,专有名词误译。 | The slang term '始皇' is not properly translated. | Incorrect translation of 始皇 - should be 'First Emperor' or transliterated, not 'Neo' | Mistranslation of '始皇' (First Emperor/Qin Shi Huang) as 'Neo'. In this context, it is likely a specific username or a meme reference that should be transliterated or kept, not replaced with an unrelated name. |
| Qwen/Qwen3-14B | The Admin posted a thread, first seat sofa (≧∇≦)ノ. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | The Admin just posted something. Front row sofa (≧∇≦)ノ. |
严重
[准确性]
"The Admin"
理由: “始皇”在论坛语境往往指楼主或特定ID,不是泛指管理员;直译为“the admin”不准确 | '始皇' is a specific nickname (First Emperor/Qin Shi Huang) often used humorously for a specific user or admin, translating to generic 'Admin' loses the specific cultural reference. | Incorrect translation: 始皇 (Qin Shi Huang / 'First Emperor') is a forum admin nickname, not a generic 'Admin'. |
| Qwen/Qwen3.5-9B | Neo has posted! Front row sofa (≧∇≦)ノ. |
严重
[准确性]
"Neo"
理由: Mistranslation of '始皇' (First Emperor/Qin Shi Huang) as 'Neo', losing the specific cultural reference and humor. | Incorrect translation: 'Neo' is a mistranslation of '始皇' (which likely refers to a username or handle, not 'Neo'). The source likely refers to a specific person or account named 始皇, not the name Neo. | 始皇 (Qin Shi Huang) is a historical reference/username, not 'Neo'. Should preserve the original name or provide context. | The term '始皇' is a slang term in gaming context, translating to 'Neo' might not capture the intended slang. | “始皇”是“秦始皇”的戏称,用于指原帖楼主或ID,并非指“Neo”。人名被随意替换,含义严重偏离。 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The founder has posted something, with a (≧∇≦) expression. |
严重
[准确性]
"The founder has posted something, with a (≧∇≦) expression."
理由: Incorrect translation of 始皇 - should be 'Qin Shi Huang' or 'the First Emperor', not 'founder'. Also '前排沙发' (front row sofa/seat) is omitted entirely. | “始皇”在此语境多指“始皇帝/秦始皇”或论坛ID,且“发帖了,前排沙发”包含“发帖”“抢沙发”等含义,译文将其泛化为“founder”并省略抢沙发含义,信息明显不足且误解称呼。 | Mistranslation of '始皇' (Shi Huang/First Emperor), a specific nickname, as 'The founder'. Also missed the slang '前排沙发' (first comment/front row sofa). | The term '始皇' is a slang term that might not be accurately translated as 'founder'. |
| google/gemma-3-12b-it | Neo posted a new thread, first row sofa (≧∇≦)ノ. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The user posted a post. |
严重
[准确性]
"The user posted a post."
理由: Missing context about '前排沙发' (front row sofa - internet slang for being first to comment) and the emoticon. Translation is too generic and loses the casual internet culture reference. | Missing slang interpretation: '始皇' is a username/nickname, '前排沙发' means 'first seat/early commenter', and the emoticon is omitted entirely | Complete failure to translate the source text. The hypothesis ignores the specific subject '始皇' (First Emperor), the action '发帖了' (posted), the forum slang '前排沙发' (first comment/front row sofa), and the emoticon. It provides a generic, unrelated sentence. | Mistranslation; source means 'Qin Shi Huang made a post, front-row/sandbox seat (first comment)', with slang and speaker identity omitted. | The translation does not capture the slang and cultural references in the source text. |
| google/gemma-3-4b-it | The Administrator has posted! (≧∇≦)ノ. |
严重
[准确性]
"The Administrator"
理由: “始皇”在论坛语境中是对原帖作者或版主的戏称,非正式头衔“Administrator”且缺失调侃语气。 | Incorrect translation: '始皇' means 'First Emperor' or a forum admin/owner, not 'Administrator' in general. The term refers to a specific person or title in this context. | Mistranslation of '始皇' (First Emperor/Qin Shi Huang), which is likely a specific username or community meme, not a generic administrator title.
严重
[准确性]
"前排沙发"
理由: 完全遗漏了“抢沙发、占楼”含义,仅保留颜文字,未传达在帖子下抢首楼的含义。 | The term '沙发' is translated as 'posted', but it should be translated as 'sofa' to maintain the slang meaning. | Omission of '前排沙发' (front row sofa), a common internet slang for being the first to comment. | Missing translation of '前排沙发' (first row sofa - internet slang for first comment). Completely omitted from hypothesis. |
| google/translategemma-12b-it | The admin (Neo) posted a new thread, first row seat (≧∇≦)ノ. |
严重
[准确性]
"The admin (Neo)"
理由: Hallucination: 'Neo' is not present in the source text. '始皇' is a nickname referring to Qin Shi Huang or a specific user, not necessarily an admin named Neo. | Incorrectly identifies 始皇 as 'admin (Neo)'. 始皇 is a username/nickname, not an admin title. The reference to Neo is an unwarranted addition. | “始皇”是对楼主/版主的戏称,非特指“管理员Neo”;括号中的Neo属无根据添加信息。 |
| google/translategemma-4b-it | The admin has posted, please take the first seat (≧∇≦)ノ. |
严重
[准确性]
"The admin has posted, please take the first seat"
理由: Misinterpretation of slang '前排沙发' (first comment/seat) as a command to the reader rather than a statement that the speaker took the first spot. '始皇' is likely a specific username or meme reference, not necessarily 'admin'. | Incorrect translation: '始皇' means 'Qin Shi Huang' (the first emperor) or a forum admin/owner, not just 'admin'. The term carries internet slang meaning of a forum owner or administrator. | Misinterprets “始皇” (nickname, e.g. for OP/author) as “admin” and “前排沙发” as an instruction rather than the speaker claiming the first comment/seat; pragmatic meaning is wrong. |
| tencent/HY-MT1.5-1.8B | The '始皇' posted a message, front-row sofa(≧∇≦)ノ. | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | The Emperor has posted a new message. Front-row sofa (≧∇≦)ノ. |
严重
[准确性]
"Front-row sofa"
理由: The term '沙发' in Chinese slang refers to the first-floor seats in a forum, not a literal sofa. | Literal translation of the slang '沙发' (sofa), which means 'first comment' in forum context. Should be translated as 'first comment' or equivalent. | Mistranslation of internet slang '前排沙发'. This refers to being first to comment/reply in a forum thread, not literal sofa. Should be 'front-row seat' or 'first comment'. | “前排沙发”是论坛里“抢到一楼”的网络用语,含义是抢到首条回复,不能按字面译为实际沙发。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Ask for a CC number, is there anyone who can help? |
严重
[准确性]
"Ask for a CC number"
理由: Mistranslation: 'CC号' likely refers to 'credit card number' or a coupon code, not 'CC number'. '佬友' is slang for 'buddy/friend', not 'anyone' | “求一个CC号”是请求他人提供一个 CC 账号/帐号,并非泛指“一个号码” | In this context, 'CC 号' likely refers to a Cloudflare account or similar service credential, not a Credit Card number. Translating as 'CC number' creates a dangerous misunderstanding regarding financial data.
轻微
[准确性]
"is there anyone who can help?"
理由: Loss of specific slang '佬友' (experienced friends/users) and the transactional nature of '出' (sell/offer). | “有佬友出吗”更具体是“有哪位大佬愿意出/分享/转让吗”,不只是泛泛“帮忙” | Missing meaning: '有佬友出吗' means 'does any buddy have one to sell?' not 'can help' |
| CohereLabs/tiny-aya-water | Ask for a CC number and see if anyone has one. |
严重
[准确性]
"Ask for a CC number and see if anyone has one."
理由: 原句是“求一个CC号,有佬友出吗”,包含向大佬求出让/转让的含义,“see if anyone has one”较弱,未体现“出/转让”的语气。 | Misinterpretation: CC号 likely refers to a credit card or coupon code, not 'CC number' in the computing context | In this context, 'CC 号' likely refers to a specific service account or code, whereas 'CC number' in English strongly implies 'Credit Card number', creating a dangerous misunderstanding. | CC号 refers to credit card numbers in hacker/fraud context. 'Ask for a CC number' is too direct; should convey the illicit nature or use appropriate terminology. |
| Qwen/Qwen2.5-14B-Instruct | Need a CC number, anyone got one? |
严重
[准确性]
"CC number"
理由: 佬友 is internet slang for 'friends/buddies' in hacker communities, not captured. 'CC号' likely refers to credit card numbers in this context, which is sensitive but the translation misses the slang register. | In this context, 'CC 号' likely refers to a Cloudflare account or similar service handle, not a Credit Card number. The translation creates a misleading and potentially unsafe implication. | “CC号”在该类语境多指账号、服务账号等,直译为“number”可能误解为“信用卡号”等数字编号,语义不清。 |
| Qwen/Qwen2.5-7B-Instruct | Looking for a CC number, anyone willing to share? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Looking for a CC number, does anyone have one to sell? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Looking for a CC number, anyone available to provide one? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Looking for a CC account, does anyone have one to sell? |
轻微
[准确性]
"does anyone have one to sell"
理由: '出' in this context implies offering or selling, but 'does anyone have one to sell' is slightly verbose compared to the concise slang 'Anyone selling?' though acceptable. | 佐 implies asking if anyone has it available, not necessarily 'to sell'. The translation adds commercial intent not clearly in source. | The term '佬友' is a slang for friends, translating to 'anyone' loses the nuance. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Looking for a CC number, anyone have one? |
轻微
[准确性]
"Looking for a CC number, anyone have one?"
理由: The term '佬友' is a slang term that might not be accurately translated as 'anyone'. | Failed to translate slang 'CC 号' (likely Cloudflare/Credit Card context depending on domain, but usually a specific account type) and missed '佬友' (slang for experienced users/friends). | “佬友”是对坛友/大佬的称呼,译为“anyone”丢失敬称/同好色彩,整体语义基本对,但略有语气和社交色彩损失。 |
| google/gemma-3-12b-it | Need to find an HS code, anyone have one? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Please find a CC number and if there are any members who can help. |
严重
[准确性]
"Please find a CC number"
理由: Misinterprets '求一个CC号' - should be 'looking for a CC account/number' (shared account context). '佬友' means 'buddy/friend in the community' not just 'members'. Translation misses the specific request context. | The translation does not accurately capture the slang and context of the source text. | Mistranslation of intent. '求一个 CC 号' means 'Looking for a CC number' (requesting), not 'Please find...'. '佬友' is specific community slang for 'experienced friends/members', translated vaguely as 'members'. The sentence structure is awkward. | '求一个CC号' means 'seeking/begging for a CC account/number', not asking someone to 'find' one. | Incorrect translation: 'CC号' refers to a specific service/account (likely a specific platform account), not a general 'CC number'. '求' means 'seeking/looking for', not 'find' |
| google/gemma-3-4b-it | Looking for a CC number, anyone interested in selling? | 通过 (无共识错误) |
| google/translategemma-12b-it | Looking for a CC number; is anyone willing to provide one? | 通过 (无共识错误) |
| google/translategemma-4b-it | Buy Now |
致命
[准确性]
"Buy Now"
理由: Complete hallucination. The source asks for a CC number and if anyone is selling, whereas the translation is an unrelated call to action. | The translation does not match the source text and is incorrect. | Complete mistranslation. Source asks for a CC number (credit card), not 'Buy Now'. The hypothesis is entirely unrelated to the source. | Completely unrelated to “求一个CC号,有佬友出吗”, which is asking if anyone is offering a CC account/number; original request meaning is lost. | Completely incorrect translation: The source asks '求一个CC号,有佬友出吗' (seeking a CC number, does any friend have one to sell?). The hypothesis 'Buy Now' is completely unrelated and appears to be a button label or mistranslation. |
| tencent/HY-MT1.5-1.8B | Do you want to find a CC number? Anyone can help? |
严重
[准确性]
"Do you want to find a CC number?"
理由: 求 means 'asking for' not 'wanting to find'. Should be 'Looking for a CC number' or 'Anyone have a CC number?' | 主语错误,原文是“求一个CC号”,表示说话人求购/求一个账号,而非询问对方是否想找。 | Misinterpreted the imperative '求' (looking for/requesting) as a question to the reader. | Incorrect translation. '求一个CC号,有佬友出吗' means 'Looking for a CC (credit card) number, does any friend have one to sell?' The translation incorrectly interprets '求' as 'Do you want' and '出' as 'help', losing the meaning of seeking to buy/obtain a CC number. |
| tencent/HY-MT1.5-7B | Can someone please provide a CC number? |
致命
[准确性]
"CC number"
理由: Misinterpretation of slang. 'CC 号' in this context likely refers to a specific account type or service code (e.g., Cloudflare, CC attack related, or a specific platform ID), not a Credit Card number. Translating as 'CC number' implies financial data, which is a severe safety and accuracy error. | Incomplete translation. Missing context about '佬友' (slang for friends/buddies) and '出' (to provide/sell). The full meaning involves asking if anyone has a CC account/credit card to share, not requesting a CC number directly. | “CC号”在语境中通常指某种账号/邀请码等,直译为“CC number”意义不明,且可能被理解为信用卡号。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | It seems not fixed yet, let's go and abuse the beginner. |
致命
[准确性]
"let's go and abuse the beginner"
理由: “上去爽蹬会儿”是趁漏洞没修复赶紧上去爽/利用一下,与“虐菜/虐新手”无关,含义被严重误解 | The term '爽蹬' refers to enjoying or having fun, not abusing a beginner. | Severe mistranslation of '爽蹬'. In this context (likely a game or system exploit), it means to enjoy using the glitch/exploit freely. It does not mean abusing a person/beginner. | Completely wrong translation: '爽蹬' is slang meaning 'enjoy/use something freely' or 'take advantage of', not 'abuse'. '新手' is not in the source |
| CohereLabs/tiny-aya-water | It seems not fixed yet, let's go and abuse the beginner. |
致命
[准确性]
"let's go and abuse the beginner."
理由: The slang '爽蹬' is not accurately translated, potentially altering the intended meaning. | Complete hallucination/mistranslation. '爽蹬会儿' is slang for enjoying a service (likely a proxy/server) while it lasts; it has no relation to abusing beginners. | 爽蹬 is slang meaning to exploit/abuse a vulnerability or system. 'abuse the beginner' is incorrect; should be 'exploit the vulnerability' or similar. | Completely wrong: 爽蹬 means 'enjoy riding/using' (positive), not 'abuse' | “爽蹬会儿”在此是利用尚未修复的漏洞/机制来爽一把,“abuse the beginner”引入“新手”概念且语义错误。 |
| Qwen/Qwen2.5-14B-Instruct | It seems not fixed yet, hurry up and abuse it for a while. |
轻微
[准确性]
"abuse it"
理由: '爽蹬' implies enjoying the usage of an exploit or feature freely; 'abuse' carries a more negative connotation than the playful tone of the source. | The term 'abuse' is incorrectly used; it should be 'enjoy' or 'have fun'. | 爽蹬 is internet slang meaning to exploit/take advantage of something, not simply 'abuse'. The translation is too literal and misses the colloquial meaning. | “爽蹬”更偏“趁机爽用/好好玩一会儿”,带享受意味,“abuse”偏滥用,语气更负面,语义不完全吻合。 |
| Qwen/Qwen2.5-7B-Instruct | It seems it hasn't been fixed yet, better go abuse it for a while. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | It seems like the issue hasn't been fixed yet. Let's go exploit it while we can. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Seems not fixed yet, better go up and abuse it for a while. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Seems not fixed yet, better go abuse it for now. |
轻微
[风格]
"go abuse it"
理由: “爽蹬”是利用未修复问题来爽用/薅羊毛,“abuse it”在技术语境中可用,但语气更负面,略有风格偏差。 | Inappropriate word choice: 'abuse' is too strong and carries negative connotations. The slang '爽蹬' means to enjoy/use something freely or exploit it for fun, not to abuse. | '爽蹬' implies enjoying the exploit while it lasts; 'abuse' captures the action but misses the nuance of enjoyment ('爽'). |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Apparently, it hasn't been fixed yet. Hurry and go have some fun. |
轻微
[准确性]
"Hurry and go have some fun."
理由: Mistranslation of '爽蹬' (slang for using a service vigorously/enjoyably before it breaks), reduced to generic 'have some fun'. | “爽蹬会儿”是网络俚语,含“趁没修复抓紧薅/利用一把”的意味,译为“have some fun”较宽泛,未体现“趁bug/未修复抓紧上”的语义细节,但大意仍通。 | The term '爽蹬' is a slang term that might not be accurately translated as 'have some fun'. |
| google/gemma-3-12b-it | It doesn't seem like it's been fixed yet, so let's quickly abuse it for a while. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The bug hasn't been fixed yet, so let's abuse it quickly. |
严重
[准确性]
"The bug hasn't been fixed yet, so let's abuse it quickly."
理由: '爽蹬会儿' is internet slang meaning to exploit/take advantage of something, not 'abuse it'. The translation captures the general meaning but '爽蹬' has specific connotations of enjoying the exploit that are lost. | Incorrect interpretation: '爽蹬' is internet slang meaning 'to enjoy/exploit', not 'abuse'. The tone is about enjoying the unfixed bug, not malicious abuse | Meaning mostly correct; misses slang nuance '赶紧上去爽蹬会儿' (go up and enjoy using/abusing it for a while). |
| google/gemma-3-4b-it | It seems like it hasn't been fixed yet, let's quickly abuse this bug for a while. |
严重
[准确性]
"abuse this bug"
理由: Over-interpretation. '爽蹬' implies enjoying the action (likely climbing or exploiting a mechanic), but explicitly adding 'abuse this bug' adds information not present in the source, though contextually probable, it's an addition. | Incorrect translation: '爽蹬' is slang meaning 'to enjoy/use something freely while it lasts' (like riding a bike), not 'abuse a bug'. The term implies enjoying a loophole or opportunity, not malicious exploitation. | “爽蹬会儿”更偏向“趁机好好玩一阵”或“爽用一会”,虽然常含利用漏洞之意,但直译为“abuse bug”略显过度明确,语气比原文更重。 |
| google/translategemma-12b-it | It doesn't seem to be fixed yet, so go ahead and quickly exploit it for a while. | 通过 (无共识错误) |
| google/translategemma-4b-it | It seems like it hasn't been fixed yet, so let's abuse it for a while. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | It seems that the issue hasn't been fixed yet. Let's rush up and abuse it for a while. |
严重
[准确性]
"abuse it"
理由: 爽蹬 is internet slang meaning to exploit/take advantage of a bug or vulnerability. 'Abuse' is too strong and lacks the technical context of exploiting an unfixed bug. | Incorrect translation of '爽蹬'. '蹬' in this context is internet slang meaning 'to use/exploit' (as in riding something). '爽蹬' means 'to enjoy using/exploiting something'. The translation 'abuse it' is too negative and misses the casual, playful tone of the original. | “爽蹬”多指利用/薅漏洞爽玩一阵,译为“abuse it”略带错误或攻击含义,语义偏差但大体可接受。 | Mistranslation of '爽蹬' (enjoy using/exploiting a loophole for fun). 'Abuse' carries a negative connotation not present in the playful context. |
| tencent/HY-MT1.5-7B | "en": "It seems it hasn't been fixed yet. Hurry up and take advantage of it before it gets patched.", "zh-TW": "看样子还没有修复。赶快利用它吧,在被修补之前。」、「ja": "どうやらまだ修正されていないようです。早く利用してください、修正される前に。」「ko": "看来还没有修复。赶紧利用它吧,在被修复之前。」、「de": "Es scheint, dass es noch nicht behoben wurde. Nutzen Sie es schnell, bevor es repariert wird。」、「fr": "Il semble que cela n’ait pas encore été réparé. Profitez-en rapidement avant qu’il ne soit corrigé。」、「it": "Sembra che non sia ancora stato riparato. Approfittatene subito prima che venga sistemato。」}|im_end|> |
致命
[准确性]
"It seems it hasn't been fixed yet. Hurry up and take advantage of it before it gets patched."
理由: “赶紧上去爽蹬会儿”是上去“爽玩一阵”的俚语,译为“take advantage of it”勉强可理解,但缺少“玩得爽一会儿”的语气色彩。 | Hypothesis contains malformed multilingual output with mixed language tags and formatting errors. The translation itself misses '爽蹬会儿' which means 'enjoy/exploit it for a while'. Also contains structural corruption with multiple language codes. | The hypothesis contains raw model output including multiple languages and JSON formatting artifacts instead of a clean Chinese translation. Complete failure to generate the target text. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Get two Plus slots and pay quarterly. |
严重
[准确性]
"Get two Plus slots"
理由: Wrong translation direction: '出' means 'sell/offer' not 'get'. The sentence means 'Selling two Plus slots, quarterly payment' | “出两个 Plus 车位”表示出售/转让两个 Plus 账号车位,不是“获取两个车位” | Mistranslation. '出两个 Plus 车位' means 'selling/offering two Plus slots', not 'get'. Should be 'selling' or 'offering'. | Mistranslation of '出' which means 'selling' or 'offering for sale' in this context, not 'getting' or 'buying'. |
| CohereLabs/tiny-aya-water | Get two Plus slots and pay quarterly. |
严重
[准确性]
"Get two Plus slots"
理由: 语气应是“出两个 Plus 车位”= 出售/出让两个 Plus 账号/车位,“Get”更像是获得而非转让,对“出”的方向有偏差。 | 车位 (parking spot) is slang for account/subscription slot. 'Get' is incorrect; should be 'Selling' or 'Offering two Plus slots'. | '出' in this context means 'selling' or 'offering', not 'getting' or 'buying'. The direction of the transaction is reversed. | Wrong direction: 出 means 'selling' or 'offering', not 'getting' |
| Qwen/Qwen2.5-14B-Instruct | Offer two Plus slots, paid quarterly. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Offer two Plus slots, quarterly payment. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Offer two Plus slots, seasonal payment. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Two Plus slots available, quarterly payment. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Create two Plus slots, quarterly payment. |
严重
[准确性]
"Create"
理由: Incorrect translation: '出' in this context means 'to sell' or 'to offer for sale', not 'create'. The phrase refers to selling two Plus membership slots. | '出' in this context means 'selling' or 'offering for sale', not 'create'. | “出两个 Plus 车位”中的“出”是“出让/出售”,不是“创建”;语义方向错误。 | 出 means 'selling/offering', not 'create'. Should be 'Selling two Plus slots' or 'Offering two Plus parking spots'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Two shared subscription slots (Plus seats) |
严重
[准确性]
"Two shared subscription slots (Plus seats)"
理由: Missing translation of '季付' (quarterly payment). The term 'Plus seats' is also a literal translation that doesn't convey the subscription sharing context. | Missing the verb '出' (selling/offering) and the payment term '季付' (quarterly payment). The translation is a fragment, not a full sentence. | 原文“出两个 Plus 车位,季付”表示转让两个Plus账号共享车位,并包含“季付”(按季度付款)的关键信息。译文省略季付,并把“出”误作“shared”而非“offer/sell”,易误解实际交易含义。 |
| google/gemma-3-12b-it | Get two Plus slots, quarterly billing. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Two Plus shared subscriptions for two people. |
严重
[准确性]
"Two Plus shared subscriptions for two people."
理由: '出两个 Plus 车位,季付' means 'selling two Plus slots, paid quarterly'; not necessarily 'for two people' and must mention selling and quarterly payment. | Hallucination and omission. The source says '出两个 Plus 车位' (Selling two Plus 'parking spots'/slots) and '季付' (quarterly payment). The hypothesis invents 'for two people' and omits the payment term 'quarterly'. '车位' in this context refers to subscription slots, not necessarily 'shared subscriptions' explicitly, though implied, the payment term is a critical missing detail. | Completely misses the meaning. '车位' (parking spot) is internet slang for shared account slots. '季付' means quarterly payment. Should be 'Selling two Plus account slots, quarterly payment'. | Incorrect: 'Plus 车位' refers to a shared subscription slot (often for streaming services), not 'for two people'. '季付' means 'quarterly payment', completely omitted |
| google/gemma-3-4b-it | Share two Plus slots, seasonal payment. | 通过 (无共识错误) |
| google/translategemma-12b-it | Get two Plus slots, billed quarterly. |
严重
[准确性]
"Get two Plus slots"
理由: “出两个 Plus 车位”是“出让/转让两个Plus账号共享名额”,不是“去获得”;应为“selling/offering two Plus slots/accounts”。 | '出' means 'selling/offering' (for sale), not 'get' (buying). The direction is reversed. | Mistranslation of '出'. In marketplace slang, '出' means 'to sell' or 'offering', whereas 'Get' implies buying or acquiring. |
| google/translategemma-4b-it | Reserve two Plus slots, seasonal subscription. |
严重
[准确性]
"Reserve two Plus slots"
理由: '出' in this context means 'selling' or 'offering', not 'reserving'. The user is offering slots for sale. | Incorrect translation: '出' means 'to sell' in this context, not 'reserve'. The source is about selling two Plus membership spots with quarterly payment, not reserving. | The term '_reserve' is slightly off; 'offer' or 'provide' would be more accurate. | “出两个 Plus 车位” means offering/selling two Plus slots, not reserving them; direction of transaction is reversed. |
| tencent/HY-MT1.5-1.8B | "en": "Out two Plus parking spaces, paid quarterly.", "zh-TW": "出兩個Plus停車位,季度支付。", "ja": "出二つのPlus駐車スペース、季節的に支払う。", "ko": "출 두 개의 Plus 주차 공간, 분기별로 지불한다.", "de": "Zwei Plus-Parkplätze, vierteljährlich bezahlt. ", "fr": "두 개의 Plus 주차 공간, 분기별로 지불합니다.", "it": "Due to due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due due |
严重
[准确性]
""en": "Out two Plus parking spaces, paid quarterly.""
理由: The hypothesis contains a malformed JSON with repeated text and appears to be a placeholder or error. The actual translation 'Out two Plus parking spaces' is also incorrect - '出' here means 'selling' (as in selling a spot), not 'out'. The entire hypothesis is nonsensical. | Hypothesis contains corrupted/repeated text instead of actual translation. The source is about selling parking spaces with quarterly payment, but the hypothesis is malformed with repetitive garbage text. | “出两个 Plus 车位”多为出售/转让账号/名额的隐喻,直译为“parking spaces”严重误解语境。 |
| tencent/HY-MT1.5-7B | "en": "Provide two Plus vehicle slots for quarterly payment.", "zh-TW": "提供两个Plus车位,按季度支付.", "ja": "提供两个Plus车位,采用季度付款方式。, "ko": "제공하는 두 개의 Plus 차량 공간은 분기별로 결제합니다.", "de": "Bereitstellen Sie zwei Plus-Fahrzeugplätze für Quartalszahlungen.", "fr": "Fournir deux places pour véhicules Plus avec paiement trimestriel.", "it": "Fornire due posti per veicoli Plus con pagamento trimestrale."}|im_end|> |
致命
[准确性]
"Provide two Plus vehicle slots for quarterly payment."
理由: The hypothesis contains raw model output including multiple languages and JSON formatting artifacts instead of a clean Chinese translation. | Hypothesis contains malformed multilingual output with mixed language tags and formatting errors. The translation misses context that '出' means 'selling/offering' and 'Plus车位' is internet slang for premium parking/service slots. | “Plus 车位”在圈内通常指某平台/套餐的账号名额或车位,非实体“车辆停车位”;“出两个”应为“出/转让/出一个名额”而非“provide”。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The driver left after just two days, so I had to bail out. |
严重
[准确性]
"The driver left"
理由: Mistranslation. '车主' means 'car owner/service provider', not 'driver'. In this context, it refers to the service provider who disappeared. | Partial mistranslation: '车主' refers to the seller/provider who ran away, not 'driver'. '跳车' is slang for 'exit/abandon a project' not literal 'bail out' | “车主跑路”是开车人跑路/失联,多用于账号拼车等语境;虽译为“driver”基本对,但“owner/organizer”更贴近语境
轻微
[风格]
"bail out"
理由: 语气略偏正式/字面,原句属网络吐槽口吻,可用更口语的“jump ship / forced to get off”以保持风格 | Using literal interpretation of slang term | '跳车' (jump off the car) is translated adequately as 'bail out', but the connection to the scam context could be clearer. |
| CohereLabs/tiny-aya-water | The driver left after just two days, forcing me to bail out. |
严重
[准确性]
"The driver left after just two days, forcing me to bail out."
理由: Misinterpretation: 车主 refers to the service provider/seller, not a driver; 跳车 means 'exiting a subscription/service' | 跳车 in this context means to exit/withdraw from a scheme or investment, not 'bail out'. Should be 'forced to jump ship' or 'exit the scheme'. | Literal translation of '车主' (car owner) and '跑路' (run away). In this slang context, it refers to a service provider shutting down and absconding with funds, not a literal driver. | “刚上车两天车主就跑路了,被迫跳车”是比喻“刚跟团两天组织者跑路,被迫退出/止损”,字面“driver left”虽接近原意但未体现团购/合租语境的比喻含义。 |
| Qwen/Qwen2.5-14B-Instruct | Just two days after joining, the owner bailed, forcing me to exit. |
轻微
[准确性]
"forcing me to exit"
理由: 上车 and 跳车 are internet slang for joining/exiting a scheme or investment. 'Joining' and 'exit' are too generic; these terms have specific connotations in fraud/scam contexts. | “跳车”在此为“被迫退出/止损”,带轻微无奈或调侃色彩,“exit”较中性,情感色彩略有欠缺但整体可理解。 | '跳车' (jump off the car) is specific slang for leaving a failing project/service abruptly. 'Exit' is too generic and loses the metaphorical nuance. |
| Qwen/Qwen2.5-7B-Instruct | The owner ran away just two days after I got on the vehicle, forcing me to bail out. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | The owner ran away two days after I got on the car, forcing me to bail out. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | The owner ran away after two days on board, forcing me to bail out. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | The owner ran away after just two days on board, forcing me to bail out. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The car owner jumped out just two days after taking the car. |
严重
[准确性]
"The car owner jumped out just two days after taking the car."
理由: The term '跑路' is a slang term that means 'to disappear' or 'go bankrupt', not 'jumped out'. | Literal mistranslation of slang. '车主跑路' means the provider ran away/scammed, not physically jumped out. '被迫跳车' means forced to leave the service, not jump from a vehicle. | “上车”“车主跑路”“跳车”均为账号/车位类隐喻:加共享、车主卷款跑路、被迫退出。译文按字面理解为真实汽车情景,整体场景误解。 | Incorrect translation of '上车' (getting access/subscription) and '跳车' (leaving/canceling). The metaphor of 'car' for subscription is completely lost. |
| google/gemma-3-12b-it | The owner ran away after only two days of using the service, forcing me to bail. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The owner ran away after only two days on the ride, forced to jump. |
严重
[准确性]
"The owner ran away after only two days on the ride, forced to jump."
理由: '上车' and '跳车' are internet slang for joining/leaving a shared account scheme. '跑路' means the owner disappeared/scammed. Translation is too literal and misses the context of account sharing schemes. | Metaphorical meaning lost: '上车' and '跳车' are internet slang for 'joining a shared subscription' and 'leaving/exiting', not literal riding | Misinterpretation of slang. '上车' in this context (likely proxy/service subscription) means 'subscribed' or 'joined the service', not physically getting on a ride. '跑路' means the provider scammed/ran away. '跳车' means forced to leave the service. The literal translation 'on the ride' and 'jump' fails to convey the service subscription context. | Source is metaphor about subscription/share plan: 'joined (a shared plan) two days ago, owner disappeared, had to quit'; literal 'ride' and 'jump' misrepresent intended meaning. |
| google/gemma-3-4b-it | The car owner ran away just two days after getting on the vehicle, forcing me to bail. | 通过 (无共识错误) |
| google/translategemma-12b-it | The owner ran away after only two days of subscribing, forcing me to bail out. |
严重
[准确性]
"subscribing"
理由: Added interpretation. '上车' means joining a group/service, 'subscribing' is specific but acceptable. | '上车' is slang for 'joining/subscribing to a service', not 'subscribing' in the traditional sense. It implies joining a group buy or proxy service. | “上车”是指跟团/上车参与某个共享资源或团购;“车主”是团长/发起人,不是“订阅”的所有者;直译为“subscribing”偏离原有比喻语境。 |
| google/translategemma-4b-it | The car owner ran away just two days after joining, forcing me to bail out. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Just two days after getting into the car, the owner ran away. I was forced to bail out. |
严重
[准确性]
"getting into the car"
理由: “上车”在语境中多指上车/入坑/参与项目,直译为实体“上车”易误解为真实乘车。 | Incorrect translation of internet slang. '上车' means 'joining/participating' (in a scheme or investment), '车主' is the 'promoter/organizer', '跑路' means 'disappeared with money/ran away', '跳车' means 'exiting the scheme'. The translation treats these as literal car-related terms, completely missing the financial scam context. | Literal translation of '上车' (joined a service/project) which is slang in this context, not literally entering a vehicle. |
| tencent/HY-MT1.5-7B | The owner fled just two days after getting in the car, forcing others to bail out. |
严重
[准确性]
"after getting in the car"
理由: Mistranslation of '跳车'. In this context, it's internet slang meaning to exit/abandon an investment or scheme, not literally 'bail out'. Also '被迫跳车' means 'forced to exit', not 'forcing others to bail out'. | Literal translation of '上车'. In this context (likely investment or project participation), it means 'joined the project' or 'got on board', not physically entering a vehicle. | “上车”在语境中多为“上某个车/项目/车位”,即参与某服务或拼车账号,不是字面坐车。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Just got on the car yesterday and was banned today, failed. |
严重
[准确性]
"failed"
理由: “翻车了”在此指“翻车/惨遭封号/出事了”,非简单“失败”,语气与含义弱化 | Mistranslation of '翻车' (overturned/crashed). In this context, it means the project failed spectacularly or the service was shut down, not just a generic 'failed'. | Inaccurate translation. '翻车' is internet slang meaning 'failed/crashed', but in this context it means the investment/service failed. | Mistranslation: '上车' is slang for 'joining/using a service', not literal 'get on car'. '封了' means 'blocked/banned', correct. '翻车了' means 'failed/got into trouble', correct |
| CohereLabs/tiny-aya-water | Just got on the car yesterday and today they banned me, I was banned. |
严重
[准确性]
"Just got on the car yesterday and today they banned me, I was banned"
理由: 上车/翻车 are investment/scheme slang. 'got on the car' and 'I was banned' are incorrect. Should be 'just invested yesterday and today it crashed/failed'. | Misinterpretation: 封 refers to the account/service being blocked, not the user being banned | Literal translation of '上车'. In this context, it means subscribing to or starting to use a service, not physically entering a vehicle. |
| Qwen/Qwen2.5-14B-Instruct | Just got started yesterday and got banned today, failed. |
严重
[准确性]
"failed"
理由: 翻车 is internet slang meaning 'crashed/failed spectacularly', not just 'failed'. 上车 refers to joining a scheme. The translation misses these specific slang meanings. | “翻车了”在此是“车翻/塌了”指事情彻底失败或遭殃,与前句“被封”呼应,单独译成“failed”未体现“翻车”的突发和惨烈感,且略显重复,可译为“it totally backfired/crashed”。 | '翻车' is specific slang for a project collapsing, getting banned, or failing spectacularly. 'Failed' is too weak and misses the sudden disaster connotation. |
| Qwen/Qwen2.5-7B-Instruct | Yesterday I just got on the car and today it's banned, flipped. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Just got on board yesterday and got banned today, failed. |
严重
[准确性]
"failed"
理由: “翻车了”是“彻底翻盘/搞砸/翻车”之意,而此处情景是服务被封禁;“failed”未准确体现“翻车”的严重失败含义,更自然应是“it was a disaster / it backfired / it totally went wrong” | '翻车' specifically means a service collapsed or a plan backfired disastrously; 'failed' is too generic and weak. | '翻车' means the plan/investment failed or was exposed/banned, not just generic 'failed'. Specific meaning of exposure/ban lost |
| Qwen/Qwen3.5-4B | Got on the car yesterday and got banned today. Failed. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Just got on board yesterday and was banned today; failed. |
轻微
[准确性]
"failed"
理由: “翻车了”在网络语境中是“翻车/出事/惨败”,“failed”过于弱化,未体现意外出事的语感。 | '翻车' (flipped car) specifically implies a sudden failure or disaster after initial success, 'failed' is too generic. | 翻车 means the scheme/investment failed or was exposed, not just 'banned'. Context suggests account/service was shut down. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Yesterday I just got on the car, and today it's already banned. It crashed. |
严重
[准确性]
"Yesterday I just got on the car, and today it's already banned. It crashed."
理由: The term '翻车' is a slang term that means 'failure' or 'disaster', not a literal car crash. | Literal translation of '上车' (joined the service/group buy) instead of the idiomatic meaning. | “上车”“封了”“翻车”是关于账号/服务被封与翻车的隐喻。译文直译“got on the car”“It crashed”,易被理解为实体车辆和物理事故,而非服务翻车,语义错位。 | Incorrect translation of '上车' (getting access), '封' (banned/blocked), and '翻车' (failed/crashed). The metaphorical meaning is lost. |
| google/gemma-3-12b-it | It was just unblocked yesterday and blocked today; it failed. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Yesterday I just got on the train and it was banned, crashed. |
严重
[准确性]
"train"
理由: '上车' in this context is joining a shared service/plan, not literally getting on a train. | '上车' is not 'got on the train' but internet slang for joining a shared account. '翻车' means the scheme failed/was exposed, not just 'crashed'. Translation is too literal. | Metaphorical meaning lost: '上车' and '翻车' are internet slang for subscription-related actions, not literal train | Misinterpretation of slang. '上车' here means subscribing to a service, not boarding a train. '翻车' means the service failed or was banned, translated literally as 'crashed' which is ambiguous (could mean vehicle crash). The entire sentence fails to convey the context of a service failure. |
| google/gemma-3-4b-it | The account was banned today after just getting on board. | 通过 (无共识错误) |
| google/translategemma-12b-it | It was just activated yesterday, but it's already banned today. It failed. | 通过 (无共识错误) |
| google/translategemma-4b-it | The account was banned yesterday and already suspended today, resulting in a failure. |
严重
[准确性]
"The account was banned yesterday and already suspended today"
理由: Severe misinterpretation of time and event. Source: 'Yesterday [I] got on, today [it] got banned'. Translation says: 'Banned yesterday, suspended today'. This reverses the timeline and duplicates the negative event. | Incorrect translation: '上车' means 'to join' (a service/group buy), '封了' means 'was blocked/banned', and '翻车了' is slang for 'failed' or 'went wrong'. The translation incorrectly states the account was banned yesterday and suspended today, which is not the source meaning. | Original: boarded yesterday, got banned today; translation changes subject from the service to “the account” and duplicates ban/suspend across two days, distorting the timeline. |
| tencent/HY-MT1.5-1.8B | Translation of '昨天刚上车今天就封了,翻车了.' in en |
致命
[准确性]
"Translation of '昨天刚上车今天就封了,翻车了.' in en"
理由: Hypothesis is a meta-description rather than an actual translation. No translation provided. | 未提供实际翻译,只是描述“Translation of … in en”,完全未传达源文含义。 | The hypothesis is a placeholder string rather than an actual translation. It literally says 'Translation of [source] in en' instead of providing the translation content. |
| tencent/HY-MT1.5-7B | "en": "Yesterday I just got on the car, and today it was banned. My account has been ‘blocked’.", "zh-TW": "昨天我刚上車,今天就被封了。我的賬戶被‘封禁’了。”, "ja": "昨日私はちょうどその車に乗ったのに、今日アカウントが“ブロック”されてしまった。」, "ko": "어제 그 차에 탔는데 오늘 바로 차단되었습니다. 내 계정이 ‘차단’되었네요。”, "de": "Gestern bin ich erst in das Auto gestiegen, und heute wurde es gesperrt. Mein Account ist ‘gesperrt’ worden.“, "fr": "Hierzu habe ich gestern erst angefangen, im Auto mitzufahren, und heute wurde mein Account ‘suspendu’.“, "it": "Ieri ho appena iniziato a usare quell’auto, e oggi il mio account è stato ‘sospeso’."}|im_end|> |
致命
[准确性]
"Yesterday I just got on the car"
理由: The hypothesis contains raw model output including multiple languages and JSON formatting artifacts instead of a clean Chinese translation. | Hypothesis contains malformed multilingual output with mixed language tags and formatting errors. The translation also misses that '翻车' is internet slang meaning 'things went wrong/failed', not literal account blocking. | “刚上车”是刚参与某项目/套餐/车位,非真的上车。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Hakiem's recent behavior seems to indicate a stealth nerf. |
严重
[准确性]
"seems to indicate a stealth nerf"
理由: Completely wrong: '哈基米' is a transliteration of 'hedgehog' (from Japanese) used as internet slang for a specific person/content, not 'Hakiem'. '降智' means 'becoming stupid/dumb' not 'stealth nerf' | '降智' literally means 'intelligence reduced' or 'acting dumb'. While 'nerf' is close in gaming context, 'acting dumb' or 'lost its mind' is more accurate to the source text. | “疯狂降智”是“变傻/智商疯狂下降”的拟人用法,不是“被暗削弱”的游戏平衡术语 | Mistranslation. '降智' means 'became stupid/lost intelligence', not 'stealth nerf'. Should be 'seems to have become less intelligent/capable recently'. |
| CohereLabs/tiny-aya-water | I feel like Hakiem has gone through a silent degradation. |
严重
[准确性]
"gone through a silent degradation"
理由: Misinterpretation: 降智 means 'dumbing down' or 'becoming stupid', not 'degradation' | The slang '疯狂降智' is not fully captured, potentially altering the intended meaning. | 降智 (lower intelligence) is slang for degraded quality/performance. 'silent degradation' is awkward; should be 'lost quality' or 'degraded in performance'. | Mistranslation of '降智' (reduce intelligence/stupidify). The model became dumber, not silently degraded in a technical sense. | “疯狂降智了”是“变得非常蠢/智商狂掉”,更偏主观评价;“silent degradation”有“悄然降级”的技术意味,语气和感情色彩不匹配。 |
| Qwen/Qwen2.5-14B-Instruct | It feels like Hakimi has been nerfed recently. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Feeling that Hacemi has been severely nerfed recently. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | It feels like Hachimi has been massively nerfed recently. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Feels like HachiMi has been shadowbanned lately. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Hajimi seems to have been nerfed lately. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The feeling is that Haki has been crazy with downgrading AI models recently. |
严重
[准确性]
"Haki has been crazy with downgrading AI models recently."
理由: “哈基米”在圈内多指特定模型/服务或昵称,并非“downgrading AI models”的主体行为;“疯狂降智了”形容其表现变蠢。译文平添“downgrading AI models”的外延和因果,曲解原意。 | The term '降智' is a slang term that might not be accurately translated as 'downgrading AI models'. | Incorrect translation of '哈基米' (a cat meme term, not a name). The slang meaning of '降智' (becoming stupid/dumbing down) is mistranslated as 'downgrading AI models'. |
| google/gemma-3-12b-it | I feel like Hakimi has been severely nerfed recently. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Feeling like recently Hakimi is crazy about lowering intelligence. |
轻微
[准确性]
"is crazy about lowering intelligence"
理由: '降智' is internet slang meaning 'acting stupid/making poor decisions'. The translation is grammatically broken and doesn't convey that Hakimi (likely a player) has been performing poorly or making bad decisions. | Incorrect structure; should be 'seems to be acting much dumber lately' or 'keeps losing IQ recently', not 'crazy about lowering intelligence'. | Incorrect: '哈基米' is internet slang for a specific service/thing (often AI or a platform), not a person's name 'Hakimi'. '降智' means 'becoming dumb/less intelligent', not 'lowering intelligence' as an action | Complete misunderstanding of slang. '哈基米' (Hakimi) is internet slang often referring to cats or cute things, sometimes used ironically for AI or services. '降智' means the content or service has become stupid/low quality. The translation treats 'Hakimi' as a proper noun acting intentionally to lower intelligence, which is nonsensical. |
| google/gemma-3-4b-it | It feels like Hakim is going crazy with shadowbanning recently. |
严重
[准确性]
"Hakim"
理由: Incorrect translation: '哈基米' is a transliteration of 'hachimi' (はちみ), a Japanese internet slang for catgirls/cats, not a person's name 'Hakim'. Also '降智' means 'dumbing down' or 'becoming stupid', not 'shadowbanning'. | Mistranslation of '哈基米'. Originally a meme sound associated with cats, in tech slang it often refers to AI models (like Llama via phonetic association or specific community jargon) or is used ironically. 'Hakim' is a proper name unrelated to the context. | “哈基米”在相关圈子多是对某平台/服务的昵称或拟人称呼,直译为人名“Hakim”且无说明,易误解为真人。
严重
[准确性]
"going crazy with shadowbanning"
理由: Mistranslation of '降智' (lowering intelligence/dumbing down). This refers to model performance degradation, not shadowbanning. | “疯狂降智”是“变蠢/变笨”的拟人化说法,与“shadowbanning”含义完全不同。 | Incorrect: '降智' means 'dumbing down' or 'becoming stupid/irrational', not 'shadowbanning'. |
| google/translategemma-12b-it | I feel like Hakimi has been experiencing significant performance degradation recently. |
致命
[准确性]
"Hakimi"
理由: '哈基米' is a slang term (likely from Japanese 'hajime' or internet meme), not the footballer Hakimi. It refers to a service, tool, or entity in the context, completely mistranslated as a person's name. | Mistranslation of '哈基米'. While originally a song about a cat, in tech/AI contexts it is often a playful nickname for specific AI models (like Llama) or services. Translating it as the proper name 'Hakimi' (a human name) misses the meme/codename reference. | “哈基米”在此是对大型模型/服务的昵称而非足球运动员;不应译作具体人名。 |
| google/translategemma-4b-it | It seems like Hakim is experiencing rapid 'shadowbanning'. |
严重
[准确性]
"'shadowbanning'"
理由: The term 'shadowbanning' may not fully capture the intended meaning of '降智'. | Incorrect: '降智' means 'becoming dumb/stupid' or 'dumbing down', not 'shadowbanning'. The term refers to the AI model becoming less intelligent or making dumb mistakes. | “疯狂降智” means “getting much dumber/losing intelligence badly”; ‘shadowbanning’ is a completely different concept. | '降智' literally means 'intelligence reduction' (becoming dumb/slow). 'Shadowbanning' is a visibility restriction, which is a different technical concept. |
| tencent/HY-MT1.5-1.8B | The feeling that Hakimi has recently become extremely intelligent is crazy. |
致命
[准确性]
"has recently become extremely intelligent"
理由: “疯狂降智了”是变蠢而非变聪明,方向相反。 | 降智 means 'loss of intelligence/becoming stupid', not 'becoming intelligent'. The translation is completely inverted in meaning. | Completely incorrect translation. '哈基米' is internet slang for 'honey' (from Japanese 蜂蜜/hachimi), used here as a term of endearment for a pet or AI assistant. '降智' means 'dumbing down/reducing intelligence'. The translation gets the meaning completely backwards. | Incorrect translation of '降智' as 'become extremely intelligent' instead of 'lower intelligence'. | Complete inversion of meaning. '降智' means 'intelligence reduced' or 'becoming stupid', but the translation says 'extremely intelligent'. |
| tencent/HY-MT1.5-7B | I feel that Hakimi has been drastically reducing his intelligence recently. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | I am here to help you. |
致命
[准确性]
"I am here to help you."
理由: “出一个哈基米车位”指出售/放出一个哈基米账号的车位,和“来帮你”完全无关 | The translation does not match the source content. | Complete hallucination. The source means 'Selling one Hachimi slot/spot'. The hypothesis has no relation to the source meaning. | Complete mistranslation. '出一个哈基米车位' means 'selling/offering a Hakimi slot', not 'I am here to help you'. | Completely unrelated translation: '出一个哈基米车位' means 'Selling a Hakiemi slot' (a service slot), not 'I am here to help you' |
| CohereLabs/tiny-aya-water | Provide a Haki car position |
轻微
[准确性]
"Provide a Haki car position"
理由: “出一个哈基米车位”指转让/出售一个哈基米账号合租名额,“Haki car position”直译车位,未传达账号合租名额这一比喻含义。 | Nonsensical literal translation. '哈基米车位' refers to a slot/spot for the specific service or meme item ('Haji Mi'), not a physical parking spot for a car named Haki. | 车位 is slang for account/subscription slot, not 'car position'. Should be 'Selling a Haki account/slot'. | Wrong transliteration: 哈基米 (hakimi) is a Japanese term, not 'Haki' |
| Qwen/Qwen2.5-14B-Instruct | Generate a Hakimi parking space. |
严重
[准确性]
"Generate a Hakimi parking space."
理由: '出' in this context means 'selling' or 'providing', not 'generate' (create). | “出一个…车位”在圈内是“出/转让一个账号名额/共享位”,不是生成实体“停车位”;整体误解隐喻用法,应是“Offering a Hakimi slot/seat”。 | 出 means 'offer/sell', not 'generate'. 哈基米车位 is a specific service offering (likely named after Hakimi). Should be 'Offering a Hakimi slot' or similar. |
| Qwen/Qwen2.5-7B-Instruct | Get a Hacme parking spot. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Offer a Hachimi parking space. |
严重
[准确性]
"Hachimi parking space"
理由: Complete misunderstanding of slang. '哈基米车位' likely refers to a slot for a specific service or meme-related item, not a literal parking space for a cat/meme named Hachimi. | “哈基米车位”是共享某账号/服务名额的隐语,不是实体“停车位”;需译为“a Hachimi account slot / seat”一类表达 | '车位' in this context is slang for account/slot in a service, not literal 'parking space'. Metaphorical meaning not conveyed |
| Qwen/Qwen3.5-4B | Reserve a Hahimi parking spot. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Create a Hachimi parking spot. |
严重
[准确性]
"Create"
理由: '出' means 'selling/offering', not 'create'. | 出 means 'selling/offering', not 'create'. Should be 'Selling a Hachimi parking spot' or 'Offering a Hachimi slot'. | “出一个…车位”中“出”指出让/出售,而非“创建”。 | Incorrect translation: '出' means 'to sell' or 'offer for sale', not 'create'. Also, '哈基米' (hachimi) likely refers to a service or product, not a name, and '车位' refers to a slot or spot in a subscription/service, not a literal parking spot. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Issue a parking ticket for a Hackney carriage. |
致命
[准确性]
"Issue a parking ticket for a Hackney carriage."
理由: “出一个哈基米车位”是转让某服务/账号的一个共享名额,与“停车罚单”“出租马车”毫无关系,语义完全偏离原句,并误解专有名“哈基米”。 | Completely incorrect translation. '哈基米' is a cat meme term, not 'Hackney carriage'. '车位' here means subscription slot, not parking space. | Complete hallucination/mistranslation. '出' means selling, '哈基米' is a nickname (cat/AI), '车位' is a subscription slot. The translation interprets it as a legal traffic violation. |
| google/gemma-3-12b-it | Find a Hakimi parking spot. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Get a parking space. |
致命
[准确性]
"Get a parking space."
理由: Complete failure to understand context. '哈基米车位' refers to a slot in a specific service (likely related to the previous 'Hakimi' context), not a physical parking space for a car. '出' means 'selling/offering', not 'Get' (which implies buying/requesting). | The translation does not accurately capture the slang and context of the source text. | Missing critical context. '哈基米车位' refers to a shared account slot for a specific service/account (Hakimi-related), not a literal parking space. Translation completely misses the internet slang meaning. | '哈基米车位' here refers to a share slot/account for 'Hakimi' (likely a service), not a literal parking space, and '出一个' means offering/selling one. | Completely wrong: '哈基米车位' refers to a shared subscription slot for a specific service, not a literal parking space |
| google/gemma-3-4b-it | Please reserve a Hakim parking spot. |
严重
[准确性]
"Please reserve a Hakim parking spot."
理由: Incorrect translation: '哈基米' is a transliteration of 'hachimi' (はちみ), Japanese internet slang for catgirls/cats, not a person's name 'Hakim'. '车位' here means 'slot' or 'account slot' in the context of shared accounts, not literal parking. | “出一个…车位”是“出让/出售一个共享位/账号位”,而非“预订停车位”;“哈基米”也不应直译为人名+停车位。 | Mistranslation of '哈基米' (see item 7). Likely refers to a specific AI model or service, not a person named Hakim. |
| google/translategemma-12b-it | Find a Hakimi parking spot. |
致命
[准确性]
"Find a Hakimi parking spot."
理由: Literal mistranslation. '哈基米车位' refers to a slot in a shared AI service (using the codename 'Hakimi'), not a physical parking spot for a entity named Hakimi. | “出一个哈基米车位”是要“出让一个哈基米账号共享名额”,不是“寻找停车位”;“出”是sell/offer而非find。 | '哈基米' is slang (same as test 7), likely a service or tool. '车位' means 'slot/spot' in the context of a group buy or subscription service, not literal parking. |
| google/translategemma-4b-it | Find a Hakimiy parking spot. |
严重
[准确性]
"Find a Hakimiy parking spot"
理由: “出一个哈基米车位” means offering/selling a Hakimi slot/account, not searching for a parking spot; misreads both “出” and “车位” in this context. | Hallucinated proper noun. '哈基米' is a meme term, not a place or entity named 'Hakimiy'. | Incorrect translation: '哈基米' is internet slang for 'cat', and '车位' means 'slot/spot' in the context of group buys or services. The source is about selling a 'cat' (service) slot, not a 'Hakimiy parking spot'. |
| tencent/HY-MT1.5-1.8B | Exactly one haki item must be placed in the parking space. |
致命
[准确性]
"Exactly one haki item must be placed in the parking space."
理由: 完全误解句子结构和含义。“出一个哈基米车位”是出/转让一个名额,与“exactly one haki item”“parking space”无关。 | Complete hallucination. '出一个哈基米车位' means 'Selling/Offering one 'Hakimi' (slang for cat/cute thing) slot/service'. The translation invents a rule about placing items. | 哈基米 is a proper noun (likely a player name/reference). 车位 means parking space. The translation is nonsensical and doesn't convey the actual meaning of selling a parking space. | Incorrect translation. '哈基米' is internet slang for 'honey' (pet name), and '车位' here likely refers to a spot/position (like in a group buy or waiting list), not a literal parking space. The translation treats '哈基米' as 'haki' (a name) and '车位' literally. |
| tencent/HY-MT1.5-7B | Provide a Haki Mi parking space. |
致命
[准确性]
"Provide a Haki Mi parking space."
理由: “哈基米车位”通常是与该名相关的账号名额/订阅位等;直译为人名音译加实体“parking space”并不符合网络语境。 | Mistranslation. '哈基米' is internet slang for a type of service/product, not a proper name. '车位' means parking space but in this context refers to a service slot/offering. The translation fails to convey the slang meaning. | Complete misunderstanding of slang. '哈基米车位' likely refers to a specific type of server slot or resource named after the meme, or a metaphorical 'spot' for something cute/small. Translating as a physical parking space for a person named Haki Mi is nonsensical. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The term '反代' refers to a reverse proxy setup. |
严重
[准确性]
"The term '反代' refers to a reverse proxy setup."
理由: Incomplete translation. The hypothesis only translates one term but doesn't translate the full sentence. Should translate the entire question about how to set up reverse proxy. | Wrong translation approach: The source asks a question about how to do '反代' (reverse proxy), not a definition of the term | 原句是在请教“反重力怎么反代”并说明遇到问题,非给术语下定义;遗漏“求助/遇到问题”的核心信息 | The hypothesis provides a definition instead of translating the user's question. The source asks 'How to set up a reverse proxy for Anti-Gravity?', not 'What does reverse proxy mean?'. | The translation does not address the context of '反重力'. |
| CohereLabs/tiny-aya-water | Guys, how do you reverse-proxy gravity? I ran into some issues while trying to. |
轻微
[准确性]
"reverse-proxy gravity"
理由: “反重力怎么反代呢”中的“反重力”是站点/服务名,“反代”指反向代理技术;译成“reverse-proxy gravity”把站点名误作字面“重力”,造成语义偏离。 | Literal translation failure. '反重力' (Anti-gravity) is likely a codename for a specific software or tool in this community, not the physical force of gravity. One cannot reverse-proxy physics. | Misinterpretation: 反重力 is internet slang for anti-censorship/GFW bypass techniques, not literal gravity | 反重力 is likely a product/service name, not literal gravity. 'reverse-proxy gravity' is nonsensical. Should identify what 反重力 actually refers to. |
| Qwen/Qwen2.5-14B-Instruct | Friends, how do you reverse proxy anti-gravity? I encountered some issues during the attempt. |
严重
[准确性]
"anti-gravity"
理由: “反重力”疑似为网站/项目名称或特定服务,直译为“anti-gravity”易被理解为物理概念,未保留专有名词特征。 | '反重力' is likely a codename for a specific proxy tool or protocol in this context, not the physical concept of anti-gravity. Literal translation makes the sentence nonsensical. | 反重力 is likely a specific tool/service name in this technical community, not literal 'anti-gravity'. 反代 means 'reverse proxy'. The translation treats it as literal rather than a proper noun. |
| Qwen/Qwen2.5-7B-Instruct | Friends, how do you reverse proxy anti-gravity? I encountered some issues while trying. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Friends, how to reverse proxy against gravity, encountered some problems during attempts. |
致命
[准确性]
"reverse proxy against gravity"
理由: '反重力' is likely a service/product name, not literal 'against gravity'. '反代' means 'reverse proxy'. Translation is nonsensical | “反重力怎么反代呢”是把“反代”(reverse proxy)和“重力”(Gravity,网站/服务名)玩梗,不是对“gravity”的字面反代;应译为“how to set up a reverse proxy for Gravity”或类似 | Literal translation of '反重力' (Anti-gravity), which is likely a codename for a specific proxy tool or protocol, not a physical action against gravity. |
| Qwen/Qwen3.5-4B | Folks, how do I reverse-proxy anti-gravity? I encountered some issues while trying. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Friends, how do you reverse proxy anti-gravity? I encountered some issues while trying. |
严重
[准确性]
"anti-gravity"
理由: “反重力”是特定服务/网站名的音译/昵称,非物理概念“anti-gravity”,直译导致严重歧义。 | '反重力' is likely a codename for a specific proxy tool or protocol, translating it literally as 'anti-gravity' makes the sentence nonsensical in a tech context. | 反重力 is likely a service/tool name, not literal 'anti-gravity'. 反代 means 'reverse proxy'. Translation conflates the two concepts. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Friends, how does reverse gravity work with reverse proxy? We encountered some issues while trying. |
严重
[准确性]
"reverse gravity"
理由: Literal mistranslation of '反重力', which is a codename for a specific proxy tool, not the physical phenomenon. | Incorrect translation of '反重力' (anti-gravity, a VPN service name) and '反代' (reverse proxy). The technical slang is misunderstood. | “反重力”是服务/站点名或专有名,与物理“反重力”无关;译为“reverse gravity”产生错误字面含义。 |
| google/gemma-3-12b-it | Hey folks, how do I reverse proxy anti-gravity? I’ve encountered some issues while trying. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Hello everyone, how do you reverse proxy? I am encountering some problems while trying. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Friends, how do we reverse proxy anti-gravity? I'm encountering some problems while trying. | 通过 (无共识错误) |
| google/translategemma-12b-it | Hey everyone, how do you reverse proxy something with anti-gravity? I encountered some problems while trying. | 通过 (无共识错误) |
| google/translategemma-4b-it | Friends, how can I reverse proxy anti-gravity? I am encountering some problems while trying. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Lao youmen, how do we reverse gravity? I've encountered some problems while trying. |
严重
[准确性]
"reverse gravity"
理由: Partial translation but '佬友' is transliterated as 'Lao youmen' which is awkward. '反重力' and '反代' are technical terms in a specific context (likely related to proxy/VPN technology). The translation doesn't capture the technical context and the phrase '反代' (reverse proxy) is not translated at all. | “反重力”是专有名或项目名,“怎么反代呢”涉及代理/反代,整体是技术配置问题而非物理“逆转重力”。 | Literal translation of '反重力' which is likely a codename for a specific tool or proxy method in this context, not actual physics. Even if literal, '反代' (reverse proxy) was ignored/misinterpreted as part of the gravity phrase. | 佬友 should be translated as 'guys/buddies' not transliterated as 'Lao youmen'. 反重力 is likely a technical term or project name that needs context. |
| tencent/HY-MT1.5-7B | "en": "Ladies and gentlemen, how can we reverse engineer anti-gravity technology? We have encountered some problems during our attempts.", "zh-TW": "各位朋友,如何反制反重力技术?我们在尝试过程中遇到了一些问题。", "ja":「皆さん、反重力技術をどのようにリバースエンジニアリングするか?試行中にいくつか問題に直面しました。」, "ko": "여러분, 반중력 기술을 어떻게 역설계할 수 있을까요? 시도하는 동안 몇 가지 문제가 발생했습니다。", "de": "Meine Damen und Herren, wie können wir die Anti-Gravitationstechnologie reverse-engineern? Während unserer Versuche sind einige Probleme aufgetreten。", "fr": "Chers amis, comment pouvons-nous effectuer un reverse engineering sur la technologie anti-gravité ? Nous avons rencontré quelques problèmes au cours de nos tentatives。", "it":"Cari amici, come possiamo effettuare un reverse engineering sulla tecnologia anti-gravità? Durante i nostri tentativi abbiamo riscontrato alcuni problemi."|im_end|> |
致命
[准确性]
"Ladies and gentlemen"
理由: The hypothesis contains raw model output including multiple languages and JSON formatting artifacts instead of a clean Chinese translation. | Hypothesis contains malformed multilingual output with mixed language tags and formatting errors. The translation also misses that '反重力' and '反代' are internet slang terms, not literal reverse engineering of anti-gravity technology. | “佬友们”是对论坛大佬/网友的称呼,非正式演讲中的“女士们先生们”。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | "en": "I bought a cheap VPS with 1 CPU and 1GB RAM on Black Friday, specifically for running the probe.", " " "zh-TW": "我於黑色星期五購買了一台價格實惠的 VPS,有 1 核和 1GB 内存,專門用來運行探針。", " " "ja": "ブラックフライデーに、1CPUと1GBのメモリを持つ cheap VPS を購入しました。探針を実行するために特別に使いました。", " " "ko": "블랙 프라이데이 때 1코어 1GB의 저렴한 VPS를 샀어요. 탐사기를 실행하기 위해 특별히 사용했어요.", " " "de": "Ich habe an Black Friday einen billigen VPS mit 1 CPU und 1 GB RAM gekauft. Er sollte speziell zum Ausführen des Probes verwendet werden.", " " "fr": "J'ai acheté un VPS bon marché avec 1 CPU et 1 Go de RAM le jour du Black Friday. Il était destiné à être utilisé pour exécuter la probe.", " " "it": "Il Black Friday ho comprato un VPS economico con 1 nucleo e 1 GB di RAM. Doveva essere usato per eseguire il probe." } | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | "en": "I bought a cheap VPS with 1 CPU and 1GB RAM on Black Friday, specifically for running the probe.", " "zh-TW": "我黑五買了一個 1核1G 的小主機,專門用來運行探針。", " "ja": "ブラックフライデーに、1CPUと1GBの安価で小型なサーバーを購入し、探針を実行するために使用しました。", " "ko": "블랙 프라이데이 때 1코어 1GB의 저렴한 가상 서버를 샀어요, 탐지 프로그램을 실행하기 위해서요.", " "de": "Ich habe an Black Friday einen billigen VPS mit 1 CPU und 1 GB RAM gekauft, speziell für das Ausführen des Probes.", " "fr": "J'ai acheté un VPS bon marché avec 1 CPU et 1 Go de RAM le jour du Black Friday, spécifiquement pour exécuter la sonde.", " "it": "Ho comprato un VPS economico con 1 CPU e 1 GB di RAM il giorno del Black Friday, specificamente per eseguire il probe." } | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Bought a low-end server with 1 core and 1GB for running probes on Black Friday. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Black Friday bought a 1-core-1GB cheap VPS, specifically used for running probes. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Bought a 1-core 1GB VPS for Black Friday, specifically for running probes. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Bought a 1-core 1GB little chicken on Black Friday, specifically for running probes. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Bought a 1-core 1GB cheap VPS on Black Friday, specifically for running probes. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | He bought a 1-core 1GB cheap VPS specifically to run scans. | 通过 (无共识错误) |
| google/gemma-3-12b-it | I bought a cheap VPS with 1 core and 1GB during Black Friday specifically for running probes. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Black five bought a cheap VPS with 1 core and 1GB RAM, specifically for probing. | 通过 (无共识错误) |
| google/gemma-3-4b-it | I bought a cheap VPS with 1 core and 1GB of RAM specifically for running probes. | 通过 (无共识错误) |
| google/translategemma-12b-it | I bought a cheap VPS with 1 core and 1GB of RAM during Black Friday, specifically for running probes. | 通过 (无共识错误) |
| google/translategemma-4b-it | I bought a cheap VPS with 1 core and 1GB of RAM specifically for running probes. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '黑五买了个 1核1G 的小鸡,专门用来跑探针.' in en", "zh-TW": "Translation of '黑五买了個 1核1G 的小雞,專門用來跑探針.' in zh-TW", "ja": "Translation of '黒五买了個 1核1G 的小雞,専門用來跑探針.' in ja", "ko": "Translation of '검은색 다섯에 1코어 1GB의 작은 닭을 샀는데, 특별히 탐침을 실행하기 위해 사용했다.' in ko", "de": "Translation of 'Schwarz fünf hat einen 1-CPU-1-GB-Küken gekauft, der speziell zum Ausführen von Sensoren verwendet wurde.' in de", "fr": "Translation of 'Noir cinq a acheté un petit poulet de 1 CPU et 1 Go, utilisé spécifiquement pour exécuter des capteurs.' in fr", "it": "Translation of 'Nero cinque ha comprato una piccola gallina di 1 core e 1 GB, utilizzata appositamente per eseguire sensori.' in it} |
致命
[准确性]
""en": "Translation of '黑五买了个 1核1G 的小鸡,专门用来跑探针.' in en""
理由: 未给出实际翻译,仅是说明性字符串。 | The hypothesis is a placeholder string rather than an actual translation. It literally says 'Translation of [source] in en' instead of providing the translation content. | Hypothesis contains meta-description and corrupted multilingual text instead of actual translation. |
| tencent/HY-MT1.5-7B | I bought a 1-core 1GB small VPS for specifically running probes. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Please recommend a budget-friendly streaming media device with easy installation. |
严重
[准确性]
"streaming media device"
理由: Mistranslation: '解锁流媒体' means 'unblock streaming media' (like Netflix, etc.), not 'streaming media device'. '落地机' refers to a VPS/server, not a physical device | “落地机”指落地 VPS/服务器用于解锁流媒体,不是“流媒体设备” | Mistranslation. '解锁流媒体的便宜落地机' means 'cheap VPS/server that can unlock streaming services', not 'streaming media device with easy installation'.
严重
[准确性]
"with easy installation"
理由: 原文仅说“便宜落地机”,未涉及“易于安装” | Addition: 'easy installation' not in source - '落地机' means 'deployed server' | Mistranslation of '落地' (ready-to-use/out-of-the-box) as 'easy installation'. The source implies a device that works immediately without complex configuration, not necessarily one that is easy to install physically. |
| CohereLabs/tiny-aya-water | Suggest a budget-friendly streaming media player with easy installation. |
严重
[准确性]
"streaming media player"
理由: “解锁流媒体的便宜落地机”中的“落地机”指落地服务器/VPS 用于解锁流媒体地区限制,不是影音播放设备;误译为“media player”改变了类别。 | '落地机' in this context (unlocking streaming) refers to a VPS or server located in a specific region, not a consumer hardware media player. | 解锁流媒体的便宜落地机 refers to cheap VPS/server that can unlock streaming services, not a 'player'. 落地机 is server/VPS slang. | Completely wrong: 落地机 refers to a VPS/server for bypassing geo-restrictions, not a media player
严重
[准确性]
"with easy installation"
理由: Added content: 'easy installation' not present in source | '落地' (landing) refers to the server's location/IP residency, not the ease of installation. | 原文未提安装难易,“专门用来解锁流媒体”被误解为“易安装”,添加无根据信息。 |
| Qwen/Qwen2.5-14B-Instruct | Recommend a cheap streaming server to unlock media. |
轻微
[准确性]
"streaming server to unlock media"
理由: “解锁流媒体的便宜落地机”通常指能解锁流媒体平台的落地VPS/节点,译文虽可理解,但“unlock media”略模糊,可更贴近为“to unlock streaming services”。 | 落地机 refers to a server in a specific geographic location (for bypassing geo-restrictions). 解锁流媒体 means 'unlock streaming services'. The translation is imprecise about the geographic/location aspect. | '落地机' specifically refers to a server located in the target country (native IP) to bypass geo-restrictions. 'Streaming server' is too generic. |
| Qwen/Qwen2.5-7B-Instruct | Recommend an affordable VPS for streaming media unlocking. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Looking for an affordable set-top box to unlock streaming services. |
严重
[准确性]
"set-top box"
理由: '落地机' in this context refers to a VPS/server that can unlock streaming, not a physical set-top box. Mistranslation of technical slang | '落地机' in this context usually refers to a local server or VPS located in a specific region to bypass restrictions, not necessarily a consumer 'set-top box'. | “落地机”在翻墙圈多指“落地服务器/落地机房的机器”,不是“机顶盒”;应是“cheap downstream server / exit node / landing server”等,用“set-top box”属于语义错误 |
| Qwen/Qwen3.5-4B | Recommend an affordable media streaming device. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Recommend an affordable streaming device that works out-of-the-box. |
严重
[准确性]
"streaming device"
理由: “落地机”在此是“落地服务器/VPS”,用于解锁流媒体地区限制,不是硬件“流媒体播放设备”。语义偏差较大。 | '落地机' in this context refers to a VPS/server located in a specific region ('landed'), not a consumer 'streaming device' like a Roku. | Incorrect translation: '落地机' in this context refers to a VPS or server (virtual private server), not a streaming device. The term '落地机' is internet slang for a server that is physically located in a specific region. | 落地机 in this context means a VPS/server that can unlock streaming services, not a physical device. Should be 'VPS for unlocking streaming' or similar. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Recommend an affordable landing aircraft for unlocking streaming services. |
致命
[准确性]
"landing aircraft"
理由: Incorrect translation of '落地机' (VPS/server located in a specific country). This is a VPS term, not an actual aircraft. | Literal mistranslation of '落地机'. In this context, it means a VPS/server located in a specific country (native IP), not an airplane. | “落地机”是解锁流媒体用的落地服务器/VPS,与真实飞机无关。译为“landing aircraft”完全误解技术语境和对象。 |
| google/gemma-3-12b-it | Looking for recommendations for an affordable device to unlock streaming services. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Recommend a cheap landing pad for streaming media. |
轻微
[准确性]
"landing pad"
理由: Mistranslation of '落地机'. In this context, '落地机' refers to a server located in a specific country (to unlock geo-restricted media), not a 'landing pad' (for aircraft/spacecraft). The technical meaning is lost. | '落地机' in this context means an exit/endpoint server (proxy/VPS) in target region, not a 'landing pad'. | '落地机' is internet slang for a server/VPS that can unlock streaming services, not 'landing pad'. '解锁流媒体' means 'unlock streaming services'. Translation is inaccurate. | Completely wrong: '落地机' refers to a VPS/server located in a specific region for unblocking content, not a literal 'landing pad' |
| google/gemma-3-4b-it | Unable to connect to server |
致命
[准确性]
"Unable to connect to server"
理由: Complete hallucination. The translation bears no relation to the source text '求推荐一个解锁流媒体的便宜落地机' (Requesting a recommendation for a cheap native server that unlocks streaming media). | Hypothesis is a system error message, not a translation of the source text. Complete failure to translate. | 完全未翻译原文“求推荐一个解锁流媒体的便宜落地机”,内容和含义完全不符。 | Completely incorrect output: The hypothesis is an error message rather than a translation of the source text '求推荐一个解锁流媒体的便宜落地机' (Looking for a cheap server recommendation for streaming media unlock). | The translation is unrelated to the source text. |
| google/translategemma-12b-it | Looking for recommendations for an affordable device to unlock streaming services. |
严重
[准确性]
"device"
理由: Mistranslation of '落地机'. In this context, it refers to a VPS/server instance located in a specific country ('landed'), not a physical consumer 'device'. | “落地机”是“落地服务器/VPS”,非实体硬件设备;应译为“server / VPS”用于解锁流媒体。 | '落地机' refers to a 'server/VPS' or 'endpoint', not a physical 'device'. In this context, it means a server to route traffic through. |
| google/translategemma-4b-it | Please recommend an affordable and reliable streaming device. |
严重
[准确性]
"reliable streaming device"
理由: “落地机” in this context is an exit node/server (VPS) used to unlock streaming, not a hardware streaming device; core object type is wrong. | '落地机' in this context refers to a VPS/server located in a specific country (to unlock geo-restricted streaming), not a physical hardware 'streaming device' like a Roku. | Incorrect translation: '落地机' in this context refers to a 'VPS/server' or 'dedicated machine' used as an exit node for proxy services, not a 'streaming device'. The term is related to '机场' (proxy service) terminology. |
| tencent/HY-MT1.5-1.8B | "en": "Translate '求推荐一个解锁流媒体的便宜落地机' into English", "zh-TW": "翻译‘求推荐一个解锁流媒体的便宜落地机’成中文”, "ja": "翻译‘求推荐一个解锁流媒体的便宜落地机’成日语”, "ko": "翻译‘求推荐一个解锁流媒体的便宜落地机’成韩语”, "de": "翻译‘求推荐一个解锁流媒体的便宜落地机’成德语”, "fr": "翻译‘求推荐一个解锁流媒体的便宜落地机’成法语”, "it": "翻译‘求推荐一个解锁流媒体的便宜落地机’成意大利语"} |
致命
[准确性]
""en": "Translate '求推荐一个解锁流媒体的便宜落地机' into English""
理由: Hypothesis is a meta-description in multiple languages rather than an actual translation. | 没有实际翻译,只是指示去翻译该句。 | The hypothesis is a placeholder string rather than an actual translation. It literally contains instructions to translate rather than the actual translation content. |
| tencent/HY-MT1.5-7B | Request a cheap floor stand for unlocking streaming media. |
致命
[准确性]
"floor stand"
理由: “落地机”在此指解锁流媒体的境外落地服务器或VPS,绝非“落地支架/脚架”。 | Mistranslation. '落地机' is internet slang for a VPS/server (not a floor stand). '解锁流媒体' means to unlock streaming services. Should be 'Request a cheap VPS for unlocking streaming media services'. | Severe mistranslation of '落地机'. In this context, it refers to a 'bare metal server' or a 'local physical machine' (as opposed to a VM), not a piece of furniture like a floor stand. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Recent usage of certain VPN providers has been flagged as being under attack. |
严重
[准确性]
"Recent usage of certain VPN providers has been flagged as being under attack."
理由: “几家机场最近老是全红,好像被蓄意打了”是说这些机场线路全部变红、疑似被恶意攻击;“has been flagged as being under attack”遗漏“全红”状态且语气与“好像”不符 | Mistranslation. '机场' (airport) is slang for VPN service. '全红' means 'all red' (indicating all nodes are down). Should be 'VPN services I use have been all down recently, seems intentionally attacked'. | Mistranslation of '全红' (all nodes showing red/unavailable) and '被蓄意打了' (deliberately targeted/blocked). The hypothesis misses the specific context of service outage due to targeting. | Partial mistranslation: '机场' is slang for 'VPN service provider', correct. '全红' means 'all red (all failing)' not 'under attack'. '被蓄意打了' means 'intentionally attacked/targeted', correct |
| CohereLabs/tiny-aya-water | The recent usage statistics for some providers have been showing up red all the time, as if they were being deliberately hit. |
严重
[准确性]
"The recent usage statistics for some providers"
理由: “使用的几家机场最近老是全红”意为“我用的几家机场最近在面板上全红”,虽“usage statistics”勉强可理解,但弱化了“节点全部离线/不可用”的含义。 | 机场 (airport) is slang for VPN/proxy service provider. 'usage statistics showing red' is incorrect; should be 'providers have been down/offline'. | Misinterpretation of '机场' (Airport). In this context, it is slang for proxy service providers, not physical airports. '全红' (all red) refers to connection nodes failing, not usage stats. | Misinterpretation: 机场 refers to VPN service providers, not 'usage statistics'; 全红 means all nodes are down/unusable, not 'showing red' |
| Qwen/Qwen2.5-14B-Instruct | The proxy providers I've been using have recently always shown red, as if they were deliberately targeted. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | The proxy services I use are often all red lately, as if they were intentionally blocked. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | The several proxy providers I've been using lately have been all red, as if they were deliberately attacked. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | The few airports I've been using lately have all turned red; they seem to be intentionally blocked. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | The few proxy providers I use have recently been completely red, as if they were intentionally targeted. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The few airports used recently are often all red, as if intentionally targeted. | 通过 (无共识错误) |
| google/gemma-3-12b-it | The proxy providers I've been using have recently been consistently red, as if they were deliberately targeted. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The recent airport is always red, as if it was deliberately attacked. |
严重
[准确性]
"The recent airport is always red"
理由: '机场' is internet slang for VPN service, not literal airport. '全红' means 'all nodes failing/all red (in status)', not 'always red' as a color | Misinterpretation of '机场'. In this context, '机场' is slang for a proxy service provider, not a physical airport. '全红' likely means all nodes are unavailable or blocked (red status), not literally the color red. The translation implies a physical airport was attacked. | '使用的几家机场最近老是全红' refers to several proxy service providers being all red (all nodes down) in monitoring; not a physical airport being red. | '机场' (airport) is internet slang for proxy/VPN service. '全红' means all nodes are down/red status. Translation completely misses the technical context of proxy services. |
| google/gemma-3-4b-it | Some of the proxy providers we use have been consistently showing red lately, as if they’ve been deliberately targeted. | 通过 (无共识错误) |
| google/translategemma-12b-it | Recently, several proxy providers I've been using have consistently shown as red (unavailable), almost as if they were deliberately targeted. | 通过 (无共识错误) |
| google/translategemma-4b-it | Some of the proxy providers I use have been showing all red lately, as if they have been deliberately targeted. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | The several airports used by us have been experiencing frequent red lights recently, as if they had been deliberately targeted. |
严重
[准确性]
"airports"
理由: 这里“机场”是机场梯子/节点服务等网络代理提供商而非实体机场,直译会误导。 | Incorrect translation of internet slang. '机场' here refers to 'VPN service providers' (airline companies), not literal airports. '全红' means 'all red (all failing/down)', not literal red lights. The translation treats these as literal terms, losing the meaning about VPN services being down. | Literal translation of '机场'. In this context (proxy services), '机场' refers to a proxy service provider, not a physical airport.
严重
[准确性]
"experiencing frequent red lights"
理由: 全红 in this context refers to network/service status being completely down/red (not literal traffic lights). Should be 'all showing red status' or 'completely down'. | “全红”指监控页/探针状态全红,表示下线或故障,“红灯”类比虽接近但略有偏差。 | Misinterpretation of '全红'. In proxy contexts, this usually means all nodes are unavailable or blocked, not necessarily traffic lights. |
| tencent/HY-MT1.5-7B | The several airports being used recently have always shown red status, as if they were intentionally targeted. |
严重
[准确性]
"airports"
理由: “机场”在翻墙圈指“代理服务提供商/机场”,而不是实体机场;直译为airports会严重误导。 | Mistranslation of slang. '机场' in this context refers to proxy service providers, not physical aviation airports. 'Red status' refers to connection failures. | Mistranslation. '机场' is internet slang for proxy/VPN service (not literal airports). '全红' means all showing problems/down. Should be 'The proxies/VPNs I've been using recently have been all down, as if they were intentionally attacked'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Today visiting SHIT looks like it's not working, is it blocked? |
轻微
[流畅性]
"Today visiting SHIT looks like it's not working"
理由: Awkward phrasing. 'Today visiting SHIT looks like...' is grammatically clunky compared to 'It seems I can't access SHIT today'. | Awkward phrasing. Should be 'seems inaccessible' or 'can't access it'. | 表达别扭,应为“Today, when visiting SHIT, it seems I can't get in / it doesn't work”更自然 | Could be more natural: 'can't access' or 'is down' |
| CohereLabs/tiny-aya-water | It seems like SHIT isn't accessible today, is it blocked? |
严重
[准确性]
"It seems like SHIT isn't accessible today, is it blocked?"
理由: 整体意义基本正确,但“进不去了”也可包含访问异常/连接不上的广义含义,不仅是 HTTP 层面的“不可访问”;翻译略窄,不过不影响主要信息。 | Likely a codename or acronym in the source. Translating it as the English profanity 'SHIT' is likely incorrect unless the source explicitly meant the English word, which is rare in this syntax. It breaks the tone. | Inappropriate: SHIT is likely a placeholder for a website name, should be kept as is or properly anonymized |
| Qwen/Qwen2.5-14B-Instruct | Could not access SHIT today, is it blocked by the firewall? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Today accessing SHIT seems to be blocked, firewalled? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Today when I tried to access SHIT, I couldn't get in. Is it blocked by GFW? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Today's access to SHIT seems blocked; was it walled? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Today's access to SHIT seems blocked; has it been firewalled? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Today's visit to SHIT seems like it's not getting through. Have you been blocked? |
严重
[准确性]
"Have you been blocked?"
理由: Shifted person. Source asks if the *site* is blocked ('被墙了吗'), translation asks if the *user* has been blocked. | 原文是在自问“被墙了吗?”即“是不是被墙/被封锁了?”,并非问对方是否被封锁;人称和指代错误。 | Incorrect subject. Should be 'it' (the website) has been blocked, not 'you'. Also 'SHIT' is a website name, not profanity. |
| google/gemma-3-12b-it | I can't seem to access SHIT today, has it been blocked by the GFW? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Today accessing SHIT seems like I can't go through, is it blocked? | 通过 (无共识错误) |
| google/gemma-3-4b-it | It seems I can’t access SHIT today; has it been blocked by GFW? | 通过 (无共识错误) |
| google/translategemma-12b-it | I can't seem to access SHIT today; is it blocked? | 通过 (无共识错误) |
| google/translategemma-4b-it | It seems like accessing SHIT today might be blocked, has it been censored? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Today, it seems that I can't access SHIT. Is there a firewall blocking me? |
严重
[准确性]
"Is there a firewall blocking me?"
理由: “被墙了吗”更自然是“Is it blocked/censored?”, 当前译法略过于个人化“blocking me”。 | 被墙 is specific Chinese internet slang for 'blocked by the Great Firewall'. 'Firewall blocking' is too generic and doesn't capture the specific reference. | The translation correctly identifies '被墙' as firewall blocking, which is accurate. However, 'SHIT' appears to be a placeholder or censored term (likely a website name). The translation is acceptable but the source term 'SHIT' itself may be problematic. |
| tencent/HY-MT1.5-7B | "en": "Today I tried to access SHIT but couldn’t get in. Is it blocked?", "zh-TW": "今天我尝试访问SHIT但进不去。是被封锁了吗?", "ja": "今日、SHITにアクセスしようとしたのですが、入れませんでした。ブロックされているのでしょうか?", "ko": "오늘 SHIT에 접속하려고 했지만 들어갈 수 없었습니다. 차단된 것인가요?」|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | This price looks like a ghost car, isn't it? Even (kuai) is possible (tao). |
轻微
[准确性]
"ghost car"
理由: Mistranslation: '灵车' means 'scam/ripoff' (metaphorically a 'ghost car' that will leave you stranded), not literal 'ghost car'. The parenthetical (kuai)(tao) is a wordplay on '快逃' (run away quickly) | Poor translation of slang. '灵车' is internet slang for scam/fraud. '(kuai)(tao)' appears to be phonetic representation of '快逃' (run away quickly). Translation is incoherent. | “灵车”在此为“很容易翻车/不靠谱的廉价机器”,非字面“鬼车” | Complete failure to translate the slang pun. '灵车' implies a scam/disaster. '(kuai) 行 (tao)' is a pun on '快逃' (run away quickly) hidden inside '还行' (it's okay). The hypothesis translates the pinyin literally, losing all meaning. |
| CohereLabs/tiny-aya-water | This price looks like a ghost car, isn't it? (kuai) Is it still running? (tao) |
严重
[准确性]
"ghost car"
理由: Misinterpretation: 灵车 refers to a 'scam' or 'poor quality service' in internet slang, not 'ghost car' | “灵车”在此是“极不稳定/随时翻车的廉价服务”的比喻,不是“鬼车”;直译导致误解。 | '灵车' (hearse) is slang for a service that is about to die or shut down soon. 'Ghost car' loses the specific connotation of impending death/failure. | 灵车 is slang for scam/fraud scheme. Pinyin annotations (kuai)(tao) are unexplained and incorrect. Should translate the actual meaning of the slang.
严重
[准确性]
"isn't it? (kuai) Is it still running? (tao)"
理由: Misinterpretation: 快 refers to 'fast/cheap', 逃 means 'escaping', the phrase indicates skepticism about the deal | Failed to interpret the wordplay. '还 (kuai) 行 (tao)' is a pun on '快逃' (run away quickly). The translation treats it as nonsensical phonetic annotations. | “还(kuai)行(tao)”是对“还行吧/还可以凑合”的谐音调侃,当前译法将其误解为“快/跑”,完全改变原意。 |
| Qwen/Qwen2.5-14B-Instruct | At this price, hmm... it looks like a hearse, eh? |
严重
[准确性]
"it looks like a hearse, eh?"
理由: “这价格…像灵车吧,还(kuai)行(tao)”整体是“价格像灵车(容易翻车/有风险),但性价比还可以”的自嘲;译文只保留“像灵车”,完全遗漏“还行/还算可以”的对比转折。 | The source contains a pun '(kuai) 行 (tao)' meaning 'soon to run away/scam'. The translation completely omits this crucial warning implied by the slang. | 灵车 is internet slang for 'scam/fraud scheme', not literal 'hearse'. 还(kuai)行(tao) appears to be phonetic slang that is completely untranslated. The translation misses critical meaning. |
| Qwen/Qwen2.5-7B-Instruct | This price...嘶...looks like a hearse, still kinda works. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | This price... It looks like a hearse, still (kuai) (tao). |
严重
[准确性]
"still (kuai) (tao)"
理由: Failed to convey the pun - '(kuai)行(tao)' sounds like '快逃' (run away quickly) indicating the price is too good to be true | “还(kuai)行(tao)”是刻意打断“还行吧”形成的谐音梗,含“还凑合/还能套现”之类复杂语气;保留拼音而不解释导致含义丢失 | Incomplete translation. '还(kuai)行(tao)' appears to be intentional character splitting/obfuscation. Should be translated as a complete phrase, not left as pinyin fragments |
| Qwen/Qwen3.5-4B | This price, uh... looks like a hearse, but (okay). | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | This price, uh... looks like a hearse, but (kuai) okay (tao). |
严重
[准确性]
"but (kuai) okay (tao)"
理由: Romanized Chinese characters should not appear in English translation. 快套 is slang meaning 'quick scam/scheme'. Should translate the meaning, not transliterate. | Incorrect translation: The parenthetical (kuai)(tao) is a phonetic hint for the Chinese phrase '快逃' (run away quickly), indicating the price is so bad it's a scam. The translation should convey this meaning, not leave it as pinyin. | “还(kuai)行(tao)”是“还可以/还行吧”带谐音梗,当前译文未解释谐音含义,整体英文不通顺,语义基本缺失。 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | This price, really? It seems like a ghost car, huh? |
严重
[准确性]
"It seems like a ghost car, huh?"
理由: “灵车”在此指极不划算/坑的服务器/套餐,不是“鬼车”。直译为“ghost car”失去价格和风险隐喻。 | '灵车' is slang for a service that will die soon (funeral car). 'Ghost car' is a literal translation that misses the idiomatic warning. | Incorrect translation of '灵车' (spirit car/hearse, meaning a scam) and the pun '(kuai)行(tao)' is completely lost. |
| google/gemma-3-12b-it | This price... wow, looks like a hearse, does it still work? | 通过 (无共识错误) |
| google/gemma-3-1b-it | This price looks like a magical chariot, huh? |
致命
[准确性]
"magical chariot"
理由: Complete failure to translate slang. '灵车' literally means 'hearse', but in slang, it refers to a service that is about to die or is unreliable (a 'death ride'). The translation 'magical chariot' is a hallucination. The parenthetical '(kuai) 行 (tao)' implies 'fast run' or 'run quickly' (scam), which is completely omitted. | Completely wrong: '灵车' is internet slang for 'scam/ripoff' (metaphorically a 'spiritual hearse'), not 'magical chariot'. The parenthetical (kuai tao) is a pun on '快逃' (run away quickly) hinting the deal is bad | Completely inaccurate. '灵车' is internet slang for a scam/fraud. '快行' (kuai xing) is a pun/slang phrase. Translation misses the meaning entirely - should convey this is a suspicious/scam price. | '灵车' is slang for a dubious/likely-to-fail service/offer, not 'magical chariot'; misses connotation of 'hearse / lemon'. |
| google/gemma-3-4b-it | This price… it looks like a fake charm, okay? |
致命
[准确性]
"fake charm"
理由: Mistranslation of '灵车' (hearse). In slang, '灵车' refers to a service that is about to die or is unreliable (leading to a 'funeral' for your money/data). 'Fake charm' is nonsensical. | Incorrect translation: '灵车' is slang meaning 'a scam or unreliable thing' (metaphor from hearse), and '(kuai)(tao)' is a pun on '快逃' (run away quickly). The translation 'fake charm' misses both the scam meaning and the pun. | 完全误解“灵车”,其在此指“看着要翻车/很不靠谱的货/服务”而非“假的护身符”。 |
| google/translategemma-12b-it | This price... well, it looks like a hearse, but I guess it's okay. |
严重
[准确性]
"but I guess it's okay."
理由: “还(kuai)行(tao)”戏谑读音,含“还算可以/还挺划算”的意味,通常带有“性价比还不错”的感觉;“I guess it's okay”过于平淡,未体现性价比或勉强满意的语气。 | Failure to translate the pun '还 (kuai) 行 (tao)'. The source implies 'It's about to run away/scam' (快逃). The translation 'it's okay' conveys the opposite meaning of the hidden text. | The text '(kuai)行(tao)' is a pun/censorship of '快逃' (run away/it's a scam). The translation completely misses this meaning. |
| google/translategemma-4b-it | Search |
致命
[准确性]
"Search"
理由: Output bears no relation to the source, which comments on a price looking like a ‘hearse deal’ yet still being kind of acceptable; meaning entirely lost. | Complete mistranslation. The hypothesis 'Search' has no relation to the source text about pricing and quality. | The translation does not match the source text and is incorrect. | Complete hallucination. The source discusses a price looking like a 'hearse' (scam/bad deal) and uses wordplay on 'okay/running away'. The translation 'Search' is unrelated. | Completely incorrect translation: The source is a comment about a price being suspicious ('灵车' means 'scam/ripoff', '快逃' is 'run away'). The hypothesis 'Search' is completely unrelated and appears to be a button label or error. |
| tencent/HY-MT1.5-1.8B | This price, hmm... looks like a hearse, doesn't it? Just okay. |
严重
[准确性]
"Just okay."
理由: 快行 appears to be phonetic slang that's unclear in context. The translation doesn't adequately convey the meaning. | “还(kuai)行(tao)”是玩笑读音,近似“还可以/还挺香”,带积极或自嘲色彩,“Just okay”偏中性甚至略负面。 | Failed to translate the parenthetical slang '(kuai)行(tao)' which implies 'Run away quickly' or 'It's a scam, run'. 'Just okay' completely misses the warning tone. | Incorrect translation. The phrase '灵车' (spirit car/hearse) is internet slang meaning 'scam' or 'ripoff', and '(kuai)行(tao)' is a pun on '快逃' (run away quickly). The translation takes '灵车' literally as 'hearse' and misses the pun entirely. |
| tencent/HY-MT1.5-7B | "en": "This price… well, it looks like a funeral car, but it’s still (okay).", "zh-TW": "這個價格…嗯,看起來像輛靈車,但還是(可以接受)的。」, "ja": "この価格…ええと、霊車のように見えますが、それでも(まあまあです)。」, "ko": "이 가격은… 음, 장례용 차량처럼 보이지만, 그래도 (괜찮네요)。", "de": "Diese Preis… nun, es sieht aus wie ein Leichenwagen, aber es ist trotzdem (in Ordnung).", "fr": "Ce prix… eh bien, ça ressemble à un corbillard, mais c’est quand même (acceptable).", "it": "Questo prezzo… beh, sembra un carro funebre, ma è comunque (accettabile)".}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | We also raise crabs on our own chickens. |
致命
[准确性]
"We also raise crabs on our own chickens."
理由: Failure to recognize slang. '小鸡' refers to a small VPS server, and '养龙虾' refers to high bandwidth usage or specific network activities. The hypothesis translates literally, resulting in nonsense. | Completely wrong: '小鸡' is slang for 'VPS', not literal chicken. '龙虾' is slang for a type of service (lobster), not literal crab | Mistranslation. '小鸡' (chicken) is slang for VPS/server. '龙虾' (lobster) is slang for mining/resource-intensive process. Should be 'also running mining/resource-intensive processes on our own VPS'. | “在自己的小鸡上也是养起了龙虾”是圈内比喻(如在自己 VPS 上跑 Crawler/监控/挖矿等),非字面“在鸡上养龙虾/螃蟹”;整体含义被完全误解 |
| CohereLabs/tiny-aya-water | Even on their own chickens, they raise crabs. |
严重
[准确性]
"Even on their own chickens, they raise crabs."
理由: “小鸡”指小服务器/VPS,“养龙虾”是跑挖矿/占资源等,比喻滥用;直译成鸡和螃蟹完全丢失网工/VPS 语境含义。 | '小鸡' (little chicken) is common slang for a small VPS/server. The translation interprets it as the animal. | 小鸡 (little chicken) is slang for cheap VPS. 龙虾 (lobster) is slang for mining/botnet. Should be 'Even on their own cheap VPS, they run mining operations'. | Completely wrong: 小鸡 is internet slang for a small VPS/server, not literal chickens |
| Qwen/Qwen2.5-14B-Instruct | Also raised lobsters on their own chickens. |
致命
[准确性]
"lobsters"
理由: Complete failure to interpret slang. '小鸡' = VPS, '养龙虾' = running high-bandwidth tasks or specific mining/scripts. The literal translation is nonsensical. | “养龙虾”在此多为隐喻梗(如跑挖矿/刷任务/跑爬虫等),按字面译为“lobsters”造成严重误解。 | 养龙虾 is internet slang for 'running mining/botnet operations'. 小鸡 means 'cheap VPS'. The translation is completely literal and misses the technical meaning. |
| Qwen/Qwen2.5-7B-Instruct | In their own chicken farm, they also raised lobsters. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | He also raised lobsters on his own chicks. |
致命
[准确性]
"on his own chicks"
理由: “小鸡”是对廉价VPS的戏称,不是“小鸡/chicks”字面含义;应译为“on his own VPS / little server” | '小鸡' means VPS/server, '龙虾' is slang for cryptocurrency mining. Translation is nonsensical; should be 'mining cryptocurrency on his own VPS' | Mistranslation of '小鸡' (VPS) to 'chicks'. |
| Qwen/Qwen3.5-4B | I also raised lobsters on my own chickens. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Also raised lobsters on their own chickens. |
严重
[准确性]
"raised lobsters"
理由: Complete mistranslation of slang. '小鸡' means small VPS, '养龙虾' means running high-load tasks. Literal translation makes no sense. | 养龙虾 is slang for running cryptocurrency mining or similar resource-intensive operations. 'Raised lobsters' is nonsensical. Should be 'mining' or 'running intensive operations'. | “养龙虾”是在VPS上跑挖矿程序(如XMRig,图标像龙虾)的隐喻,并非真的“养龙虾”。 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | On his own little chicken, he has also raised shrimp. |
严重
[准确性]
"On his own little chicken, he has also raised shrimp."
理由: Should be translated as VPS/server, not literal chicken. | “小鸡”是VPS,“养龙虾”指跑流量监控/探针等(图标像龙虾/洛神),并非真在鸡上养虾。译文完全按字面理解成养殖场景,彻底误解技术隐喻。 | Incorrect translation. '小鸡' is slang for VPS, and '龙虾' (lobster) is slang for a specific type of server/service. Not literal chicken and shrimp. |
| google/gemma-3-12b-it | I also raised lobsters on my own chickens. | 通过 (无共识错误) |
| google/gemma-3-1b-it | In my own small chicken, I also started raising shrimp. |
严重
[准确性]
"small chicken"
理由: '小鸡' (VPS/server) and '龙虾' (lobster/mining) are internet slang terms. Translation is too literal. Should convey running cryptocurrency mining or similar resource-intensive operations on a VPS. | '小鸡' is internet slang for a small VPS/server, not literal chicken. '龙虾' here likely refers to a specific service/container (like 'lobster' in tech context), not literal shrimp | Literal translation of '小鸡'. In this context, it means a small VPS/server. 'Raising shrimp' for '养起了龙虾' is also likely a metaphor for running high-load tasks or earning money, which is lost in the literal translation. The sentence makes no sense in English. | '小鸡' is slang for a small VPS server, not a literal small chicken. |
| google/gemma-3-4b-it | They are also raising lobsters on their own little chickens. |
严重
[准确性]
"raising lobsters"
理由: Literal translation of '养起了龙虾'. This is likely a specific metaphor or typo in the community (perhaps 'running heavy loads' or a specific app name). Literally raising lobsters on a VPS ('little chicken') is nonsensical. | Incorrect translation: '龙虾' (lobster) is internet slang for 'longhorn beetles' or in some contexts refers to certain types of proxy/VPN services. The literal translation 'lobsters' is incorrect in this context. | “养龙虾”是圈内黑话(如跑挖矿等),非真养龙虾;直译成养动物误解语境。 |
| google/translategemma-12b-it | I also raised lobsters on my own chickens. |
致命
[准确性]
"raised lobsters on my own chickens"
理由: “小鸡”是VPS,“养龙虾”是跑探针或监控脚本的戏称,整体意思是“在自己的VPS上也跑起了探针/监控”;直译字面生造情景,严重误解网络隐喻。 | '小鸡' is slang for a 'cheap VPS' (virtual private server), not literal chickens. This is a critical mistranslation of common VPS slang. | Literal mistranslation of slang. '小鸡' means VPS, '养龙虾' likely refers to running high-resource tasks or a specific profitable setup. The translation creates a nonsensical image of animals. |
| google/translategemma-4b-it | They are also raising lobsters on their own chickens. |
致命
[准确性]
"They are also raising lobsters on their own chickens"
理由: Incorrect translation: '小鸡' is internet slang for a cheap VPS (Virtual Private Server), not literal 'chickens'. The term '龙虾' (lobster) is also slang, likely referring to a specific service or tool. | Failure to decode slang. '小鸡' = VPS, '养龙虾' = running high-bandwidth tasks or earning money. The literal translation creates a nonsensical image of lobsters on chickens. | Takes slang literally: “小鸡” is a VPS and “养龙虾” is running Xray (or similar) on it; translation talks about actual animals and chickens, changing the entire domain. |
| tencent/HY-MT1.5-1.8B | Translation of '在自己的小鸡上也是养起了龙虾.' in en |
致命
[准确性]
"Translation of '在自己的小鸡上也是养起了龙虾.' in en"
理由: The hypothesis is a placeholder string rather than an actual translation. It literally says 'Translation of [source] in en' instead of providing the translation content. | 未提供实际翻译,只是模板文本。 | Hypothesis is a meta-description rather than an actual translation. 小鸡 (small chicken) is slang for VPS/server; 龙虾 (lobster) is likely slang for a specific service/application. |
| tencent/HY-MT1.5-7B | Raised lobsters on one's own chicks. |
致命
[准确性]
"Raised lobsters on one's own chicks."
理由: “小鸡”是VPS俚语非小鸡;“养龙虾”指在机器上跑挖矿木马“挖矿/跑矿”,整体应为“在自己的小鸡上也养起了‘龙虾’(挖矿)”,当前译文基本完全误解。 | Mistranslation. '小鸡' is internet slang for VPS/server (not literal chicks). '养龙虾' is slang meaning to run mining/resource-intensive operations. Should be 'Running mining operations on one's own VPS'. | Mistranslation of '小鸡'. Here it means 'small VPS/server'. 'Lobsters' (龙虾) likely refers to a specific heavy-load application or mining script. The translation suggests raising animals on baby birds. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Please enter your email address. |
致命
[准确性]
"Please enter your email address."
理由: 与原文“本地跑的量化版应该还是不如官方的满血吧...即便官方经常降智力。”毫无关系,完全未译原意 | Complete mistranslation. Has no relation to source text about quantitative trading versions and intelligence loss. | The translation does not match the source content. | Hallucination. The hypothesis has no relation to the source text which discusses local quantized models vs official full versions. | Completely unrelated translation: The source discusses local量化版 vs官方 version and intelligence reduction, not email |
| CohereLabs/tiny-aya-water | Local runs should still be inferior to the official full version... even though the official often lowers intelligence. |
严重
[准确性]
"even though the official often lowers intelligence."
理由: Misinterpretation: 降智力 means 'degrading performance/quality', not literally 'lowering intelligence' | 降智力 is slang for degraded quality/performance. 'lowers intelligence' is literal and awkward; should be 'degrades in quality' or 'loses performance'. | “官方经常降智力”是调侃官方模型输出变蠢/弱智,直译为“lowers intelligence”略显字面化,但基本可理解,仅轻微风格/语气偏差。 | Literal translation of '降智力'. In the context of AI models, this means the model's performance or reasoning capability has been reduced/dumbed down, not that it actively 'lowers intelligence' as an action. |
| Qwen/Qwen2.5-14B-Instruct | The local running quantified version should still not be as good as the official full-featured one..., even though the official one often reduces intelligence. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | The local quantized version should still not be as good as the official full-power version... even though the official often reduces intelligence. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | The quantized version running locally is probably still not as good as the official full-powered one... Even though the official one often reduces intelligence. |
严重
[准确性]
"reduces intelligence"
理由: “降智力”是说模型表现变笨,“reduce intelligence”略生硬,可用“gets dumber / dumbs it down”在语气上更贴合 | '降智力' is slang for model performance degradation or 'dumbing down'. Literal 'reduces intelligence' sounds like the AI is consciously making itself stupid, rather than performance dropping. | '降智力' means the model/service has reduced capabilities/performance, not literal 'reduces intelligence'. Anthropomorphic translation is imprecise |
| Qwen/Qwen3.5-4B | The quantized version running locally should still be inferior to the official full-bleed version... even if the official team frequently reduces intelligence. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | The locally run quantized version is probably still not as good as the official full-blooded one... even though the official version frequently lowers its intelligence. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The local version's performance should still be inferior to the official version with full blood, even if the official version often lowers intelligence. | 通过 (无共识错误) |
| google/gemma-3-12b-it | A locally run quantized version probably still isn't as good as the official full-blooded one... even though the official version often lowers intelligence. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The local version should still be inferior to the official version... even though the official often reduces intelligence. |
轻微
[准确性]
"even though the official often reduces intelligence."
理由: '降智力' is internet slang meaning 'becoming dumber/less capable', not literally 'reduces intelligence'. The tone is about AI models getting weaker, not literal intelligence reduction | '满血' means 'full power/full version'. '降智力' is internet slang for 'making poor decisions/degrading'. Translation is mostly correct but could be more idiomatic. | Literal; intended meaning is 'even though the official one often behaves dumber / gets nerfed', but overall idea is conveyed. | Literal translation of '降智力'. In the context of AI models, this means the model's performance or capability has been reduced/dumbed down. 'Reduces intelligence' sounds like the official entity is actively making things less smart, rather than the model version being less capable. 'Degraded performance' would be better. |
| google/gemma-3-4b-it | The locally run quantized version is still not as good as the official full-blooded one… even though the official often lowers intelligence. | 通过 (无共识错误) |
| google/translategemma-12b-it | The locally run quantized version is probably still not as good as the official full version... even though the official version often reduces intelligence. |
轻微
[风格]
"reduces intelligence"
理由: '降智力' is slang for 'degrading performance/reducing capabilities' (often used for AI models), 'reduces intelligence' is too literal. The tone is about capability degradation, not literal intelligence. | Literal translation of '降智力'. While understandable, it sounds awkward in English technical contexts compared to 'performance drops' or 'dumb down'. | “经常降智力”是调侃模型变笨,直译为“reduces intelligence”显得科学化且怪异;更自然的是“gets dumber / becomes less smart sometimes”。 |
| google/translategemma-4b-it | The locally run quantized version is still not as good as the official full-blooded version... even though the official version often downgrades intelligence. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | The locally run quantitative version should still be inferior to the official full-capacity version... even though the official version often reduces intelligence. | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "The localized version running locally should still be inferior to the official full-powered version, right? Even though the official version often reduces intelligence.", "zh-TW": "在本地运行的本地化版本应该仍然不如官方的完整版,对吧?即使官方版本经常降低智力值。", "ja": "ローカルで実行されるローカライズ版は、公式のフルパワーバージョンにはまだ及ばないでしょうね…たとえ公式バージョンがよく知能値を下げても。」, "ko": "로컬에서 실행되는 현지화 버전은 공식의 풀 파워 버전에 비해 여전히 부족할 것입니다…공식 버전이 자주 지능 값을 낮추더라도 말입니다。", "de": "Die lokalisierte Version, die auf dem lokalen Rechner ausgeführt wird, sollte immer noch nicht so gut sein wie die offizielle Vollpower-Version, oder? Auch wenn die offizielle Version oft die Intelligenzwert senkt。", "fr": "La version localisée qui s’exécute sur un ordinateur local ne devrait toujours pas être aussi bonne que la version officielle à pleine puissance, n’est-ce pas ? Même si la version officielle diminue souvent la valeur de l’intelligence。", "it": "La version localizzata che viene eseguita su un computer locale non dovrebbe comunque essere migliore della versione ufficiale a pieno potenziale, giusto? Anche se la versione ufficiale riduce spesso il valore dell’intelligenza。" |
致命
[准确性]
"localized version running locally"
理由: The hypothesis contains raw model output including multiple languages and JSON formatting artifacts instead of a clean Chinese translation. | Hypothesis contains malformed multilingual output with mixed language tags and formatting errors. The translation also misses context about what '量化版' and '满血' refer to in the specific domain being discussed. | “本地跑的量化版”指在本地运行的量化压缩模型(quantized model),而非“本地化版本”。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | This is a test. |
致命
[准确性]
"This is a test."
理由: Hallucination. The hypothesis is a placeholder string and does not translate the source praising a detailed tutorial. | Completely unrelated translation: The source praises a detailed tutorial as '喂饭' (spoon-feeding/very detailed guide), not a test | 与原文“写得太详细,喂饭教程”完全无关,未传达任何源文信息 | The translation does not match the source content. | Complete mistranslation. Source discusses detailed tutorial, not a test. |
| CohereLabs/tiny-aya-water | The author is too detailed, it's pure feeding tutorial! | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | The explanation is very detailed, a genuine feeding tutorial! | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | The master has written too detailed instructions! A pure feeding guide! | 通过 (无共识错误) |
| Qwen/Qwen3-14B | The master wrote it too detailed, it's like a pure feeding tutorial! |
严重
[准确性]
"a pure feeding tutorial"
理由: “喂饭教程”比喻特别简单、像喂饭一样;应是“hand-holding tutorial / spoon-feeding guide”,直译“feeding tutorial”不自然也不传神 | '喂饭教程' is slang for an extremely detailed, step-by-step tutorial (spoon-feeding). Literal 'feeding tutorial' sounds like instructions on how to feed someone/something. | '喂饭教程' means 'spoon-feeding tutorial' (overly detailed/easy), not literal 'feeding tutorial'. Idiomatic meaning lost |
| Qwen/Qwen3.5-4B | The tutorial written by this expert is incredibly detailed and authentic! | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | The tutorial written by this expert is extremely detailed! A pure feeding-tutorial! | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The guide is too detailed, just feeding instructions! | 通过 (无共识错误) |
| google/gemma-3-12b-it | The boss wrote this so thoroughly; a pure feeding tutorial! | 通过 (无共识错误) |
| google/gemma-3-1b-it | The detailed instructions written by '大佬' are very thorough and perfect for feeding! | 通过 (无共识错误) |
| google/gemma-3-4b-it | The details written by seniors are too thorough; it's a genuine feeding tutorial! | 通过 (无共识错误) |
| google/translategemma-12b-it | The author wrote this very thoroughly; it's a pure and simple tutorial! | 通过 (无共识错误) |
| google/translategemma-4b-it | The instructions written by experts are too detailed and provide a genuine feeding tutorial! | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | The guy wrote too detailed, pure feeding tutorials! |
严重
[准确性]
"pure feeding tutorials"
理由: “喂饭教程”是“手把手、非常详细易懂”的比喻,而非真正“喂饭”;直译导致误解。 | Literal translation of '喂饭教程'. The idiom means a 'step-by-step tutorial so easy even a baby could do it' (spoon-feeding). 'Feeding tutorials' sounds like teaching someone how to feed. | 喂饭教程 is internet slang meaning 'spoon-feeding tutorials' (overly detailed/easy tutorials). 'Feeding tutorials' is awkward; should be 'spoon-fed tutorials' or 'hand-holding guides'. |
| tencent/HY-MT1.5-7B | The boss wrote this in such detail—it’s a perfect tutorial for beginners! |
轻微
[准确性]
"perfect tutorial for beginners"
理由: Mistranslation of '喂饭教程'. This internet slang means 'spoon-feeding tutorial' (overly detailed, step-by-step), not just 'tutorial for beginners'. The connotation is slightly negative/sarcastic. | “喂饭教程”强调非常详细、傻瓜式手把手教学,未必特指“for beginners”,可表达为“hand-holding tutorial / step-by-step guide”。 | '喂饭教程' literally means 'spoon-feeding tutorial', implying it is so detailed no effort is required. 'Perfect tutorial for beginners' captures the gist but loses the specific idiomatic flavor. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Your feedback is valuable to us, so please feel free to share any suggestions or concerns. |
致命
[准确性]
"Your feedback is valuable to us, so please feel free to share any suggestions or concerns."
理由: Completely unrelated translation: The source says 'the project idea seems fun, bookmarking it' (马克一下 = bookmark), not feedback | Hallucination. The source expresses interest in a project ('马克一下' means to bookmark/save), while the hypothesis is a generic customer service response. | 与“觉得佬友的项目思路很好玩,马克一下”含义完全不符;原文在称赞项目思路有趣并作书签标记 | The translation does not match the source content. | Complete mistranslation. Source means 'the project idea is interesting, I'll bookmark it', not feedback request. |
| CohereLabs/tiny-aya-water | I think your project idea is fun, mark me. |
严重
[准确性]
"I think your project idea is fun, mark me."
理由: Mistranslation of '马克' (Mark/Bookmark). The user intends to bookmark the post for later, not mark themselves. | 马克一下 means 'bookmark/save for later', not 'mark me'. Should be 'I'll bookmark this' or 'saving this'. | “马克一下”是“做个书签/收藏一下”的网络用语,不是“mark me”;语义严重偏离。 | Misinterpretation: 马克 means 'bookmark/mark for later', not 'mark me' (volunteering) |
| Qwen/Qwen2.5-14B-Instruct | I think my friend's project idea is really fun, marking it. |
严重
[准确性]
"my friend's project idea"
理由: 佬友 is internet slang for 'buddy/friend in the community', not just 'friend'. The possessive 'my' is also unclear from the original. | '佬友' refers to a respected community member, not necessarily a personal 'friend'. | “佬友”是对坛友/大佬的称呼,指对方,不是“my friend”;加入“my”改变人称。 |
| Qwen/Qwen2.5-7B-Instruct | I feel that Laoyou's project concept is very interesting, mark it down. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | I think my friend's project idea is really fun, I'll bookmark it. |
严重
[准确性]
"my friend's project"
理由: '佬友' refers to a respected community member, not specifically 'my friend' | “佬友的项目”是指某位大佬朋友的项目,而非“my friend”;应为“this bro’s project / the fellow pro’s project”等,不应引入第一人称所属 | '佬友' is internet slang for 'buddy/friend in the community', not 'my friend'. The speaker is referring to someone else's project, not their own | '佬友' refers to a peer/expert in the community, not necessarily a personal 'friend'. |
| Qwen/Qwen3.5-4B | Felt that Lao You's project ideas were very fun; marked them. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | I feel that Lao You's project idea is quite fun; I'll mark this. |
严重
[准确性]
"Lao You's"
理由: '佬友' is a community term for 'fellow experts/members'; treating it as a proper name 'Lao You' is incorrect. | Incorrect translation: '佬友' is a friendly term for 'buddy' or 'friend' in Chinese internet slang, not a proper name. It should be translated as 'buddy' or 'friend', not transliterated as 'Lao You'. | 佬友 is a general slang term for 'buddy/friend' in tech communities, not a proper name. Should be 'this friend's project' or 'buddy's project'. | “佬友”是对论坛大佬/网友的称呼,并非专有名“Lao You”,误译为人名。 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The project ideas of my friends are very fun to play with, mark them down. | 通过 (无共识错误) |
| google/gemma-3-12b-it | I think Lao You's project idea is fun, I'll bookmark it. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The idea for the project by Feel-Looy is very fun and engaging. Let me summarize. |
严重
[准确性]
"Feel-Looy"
理由: '佬友' means 'buddy/friend in the community', not 'Feel-Looy'. '马克一下' means 'bookmark/mark it' not 'summarize'. Translation is inaccurate on multiple counts. | '佬友' is internet slang for 'fellow user/member', not a name 'Feel-Looy'. '马克' means 'bookmark/mark', not 'summarize' | Mistranslation of '佬友'. '佬友' means 'fellow experienced user/friend'. 'Feel-Looy' is a phonetic hallucination. '马克' (Mark) means to bookmark/save for later, translated as 'summarize' which is incorrect. | Misinterprets '佬友' (fellow expert/member); introduces a non-existent proper name. |
| google/gemma-3-4b-it | Success |
致命
[准确性]
"Success"
理由: 与原文“感觉佬友的项目思路很好玩儿,马克一下”完全不符,原意是觉得项目思路有趣并标记收藏。 | Completely incorrect output: The hypothesis is an error message rather than a translation of the source text '感觉佬友的项目思路很好玩儿,马克一下' (Feeling that the friend's project idea is interesting, bookmarking it). | Hypothesis is a system message, not a translation. Complete failure to translate the source text. | Complete hallucination. The source says '感觉佬友的项目思路很好玩儿,马克一下' (Feel that the friend's project idea is fun, marking it/bookmarking it). The translation is unrelated. | The translation is unrelated to the source text. |
| google/translategemma-12b-it | I think Lao You's project ideas are very interesting; I'll bookmark them. | 通过 (无共识错误) |
| google/translategemma-4b-it | "en": "I think your project idea is really interesting and innovative. I'm saving it.", "zh-TW": "我覺得你的專案想法很有趣且創新。我把它記下來。", "ja": 「あなたのプロジェクトのアイデアはとても興味深く、革新的だと思います。メモしておきます。」, "ko": "당신의 프로젝트 아이디어가 정말 흥미롭고 혁신적이라고 생각합니다. 저장해둡니다.", "de": "Ich finde, dass deine Projektidee sehr interessant und innovativ ist. Ich merke sie mir.", "fr": "Je trouve que ton idée de projet est très intéressante et novatrice. Je la note.", "it": "Penso che la tua idea di progetto sia molto interessante e innovativa. La prendo nota." } <|im_start|> | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Now let's move on to the next task. |
致命
[准确性]
"Now let's move on to the next task."
理由: Hypothesis is completely unrelated to source. Source is about finding a friend's project interesting and bookmarking it. Hypothesis appears to be a generic transition phrase. | Doesn't match the source content. | 与原句“感觉佬友的项目思路很好玩儿,马克一下。”完全不符,原文是评价项目有趣并做标记收藏。 | Complete hallucination. The source text expresses interest and saving the post ('马克一下'), but the translation is an unrelated command. | Completely unrelated translation. The source says '感觉佬友的项目思路很好玩儿,马克一下' meaning 'I think the friend's project idea is very interesting, let me bookmark it (mark it)'. The hypothesis is a generic unrelated sentence about moving to the next task. |
| tencent/HY-MT1.5-7B | "en": "I find the project approach of this guy really interesting. Mark it down.", "zh-TW": "我觉得這個人的項目方法真的很有趣。記下它吧!", "ja": "この人のプロジェクトアプローチが本当に面白いと思う。メモしておこう。」, "ko": "이 사람의 프로젝트 접근 방식이 정말 흥미롭다고 생각합니다。메모해 두세요.", "de": "Ich finde die Projektansatz dieser Person wirklich interessant. Notieren Sie es sich.", "fr": "Je trouve que l’approche projet de cette personne est vraiment intéressante. Notez-la.", "it": "Ritengo che l’approccio progettuale di questa persona sia davvero interessante. Prendetela nota."} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Requesting something, let's hang out. |
严重
[准确性]
"Requesting something, let's hang out."
理由: Mistranslation. '同求,蹲一下' means 'same request, I'm waiting/looking for the same thing', not 'let's hang out'. | The term '蹲一个' refers to waiting, not hanging out. | “同求,蹲一个”是表示“同样求一个,蹲帖等资源”,不是“出去玩/闲逛” | Partial mistranslation: '同求' means 'same request/also looking for'. '蹲一个' means 'waiting for one' (like 'count me in'), not 'hang out' | Mistranslation of '蹲一个'. In this context, it means 'waiting for someone to provide the requested item/info', not socially hanging out. |
| CohereLabs/tiny-aya-water | Requesting something, let's do this. |
严重
[准确性]
"Requesting something, let's do this."
理由: “同求,蹲一个”是“我也求一个,蹲帖等资源/回复”,表达一起等待资源;译文变成泛泛“请求做某事”,未体现“同求”和“蹲等”的含义。 | Misinterpretation: 蹲 means 'waiting/hoping', not 'let's do this' | Mistranslation of '蹲一个' (squatting for one). This slang means waiting for someone to offer the requested item. 'Let's do this' implies immediate action. | 同求,蹲一个 means 'same request, waiting for someone to provide'. Should be 'Same request, waiting for one' or 'Also looking for this'. |
| Qwen/Qwen2.5-14B-Instruct | Looking for the same, waiting for one. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Looking for one.,蹲一个。 |
致命
[流畅性]
"Looking for one.,蹲一个。"
理由: 中英混用且多了句号,未翻译“蹲一个”(表示守候等待),整体不自然且含义不完整。 | Incomplete translation with untranslated Chinese text left in output; 蹲一下 means 'waiting/lurking for one' | Failed to translate the second part of the sentence. '蹲一个' means 'waiting for one' or 'camping for one'. |
| Qwen/Qwen3-14B | Same here, waiting for one. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Same request, waiting for one. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Looking for someone like me. |
严重
[准确性]
"Looking for someone like me."
理由: Incorrect translation: The source '同求,蹲一个' means 'Same request, waiting for one too' or 'I'm also looking for it, waiting for someone to respond'. The translation 'Looking for someone like me' is incorrect. | “同求,蹲一个。”是“我也求,同样在等(资源/链接)”,与“找和我一样的人”无关,含义完全偏离。 | 同求蹲一个 means 'same request, waiting for someone to provide/share'. Should be 'Same request, waiting for it' or 'Looking for the same thing'. | '同求,蹲一个' means 'I also want this, waiting for one to appear'. The hypothesis completely misinterprets the meaning. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Same request, squat one. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Looking for the same, waiting for one. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '同求,蹲一个' in en |
致命
[准确性]
"Translation of '同求,蹲一个' in en"
理由: No actual translation is provided; the hypothesis is a meta-statement, not a translation. | The translation is not provided; it is just a note. | No actual translation provided. '同求' means 'same request/me too' and '蹲一个' means 'waiting for one/count me in'. Should be 'Same here, count me in' or similar. | Not a translation - this is just the source text with 'en' appended. No actual translation provided | The hypothesis is a meta-comment describing the task instead of performing the translation. '同求' means 'Same request' and '蹲一个' means 'Waiting for one/Squatting for one' (slang for waiting for someone to share/sell). |
| google/gemma-3-4b-it | Looking for someone to share with, looking for a place. | 通过 (无共识错误) |
| google/translategemma-12b-it | Looking for the same thing, waiting for a solution. | 通过 (无共识错误) |
| google/translategemma-4b-it | Message |
致命
[准确性]
"Message"
理由: Complete hallucination. Source means 'Same request, waiting for one'. Translation is unrelated. | Complete mistranslation. Source means 'Same request, waiting for it' (蹲 means squatting/waiting). 'Message' is unrelated. | The translation does not match the source text and is incorrect. | Completely incorrect translation: The source '同求,蹲一个' means 'Same request, waiting for one too' or 'I'm also looking for it, please DM me'. The hypothesis 'Message' is completely unrelated and appears to be a button label or error. | “同求,蹲一个。” means ‘I’m also looking for this, waiting for an offer’, whereas “Message” is unrelated; original meaning missing. |
| tencent/HY-MT1.5-1.8B | Same request, squat one. |
严重
[准确性]
"squat one"
理由: “蹲一个”是等一个机会/资源的网络用语,直译为“squat”引起身体动作联想,语义错误。 | Literal translation of '蹲'. In internet slang, '蹲' means 'waiting for' or 'keeping an eye out for'. 'Squat one' is nonsensical. | 蹲 is internet slang meaning 'waiting for' or 'looking for'. Should be 'Same request, waiting for one' or 'Same, looking for one'. |
| tencent/HY-MT1.5-7B | Request for assistance; need to squat down. |
致命
[准确性]
"Request for assistance; need to squat down."
理由: “同求,蹲一个。”是“+1,同样在等一个(资源/回复)”的意思,“蹲”比喻蹲贴等更新,与“蹲下”无关。 | Literal translation of '蹲'. In this context, '蹲一个' means 'waiting for one' or 'keeping an eye out for one'. It does not mean physically squatting. | Mistranslation. '同求' means 'same request/me too' and '蹲一个' is internet slang meaning 'waiting for the same thing/following this thread'. Should be 'Same request; waiting for an answer too'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | I'm not sure if you can understand what I'm saying, but I'll try to explain it simply. |
致命
[准确性]
"I'm not sure if you can understand what I'm saying, but I'll try to explain it simply."
理由: Hallucination. The source states the user has the same problem but cannot help, only offering support ('DD' means to bump/support). The hypothesis is completely unrelated. | 与原句“之前只是大概接触过…遇到同样问题…爱莫能助,只能DD一下了”无关,完全偏离原意 | Complete mistranslation. Source discusses encountering same problem and offering sympathy/support, not explaining something. | Major omission: The source says 'I only had brief contact before, encountered same problem as poster, can't help, can only DD (顶顶/bump)'. The hypothesis ignores most of this | The translation does not match the source content. |
| CohereLabs/tiny-aya-water | I had only vaguely touched upon this before, but I also encountered issues similar to those faced by the poster, and was helpless; I could only post here. |
严重
[准确性]
"I could only post here."
理由: “只能DD一下了”是“只能帮顶一下/顶帖支持”,不是“只能在这里发帖”;“DD”作为“顶顶”含义未被保留。 | DD is internet slang (顶/bump). 'post here' is vague; should be 'bump the thread' or 'upvote the post'. | Misinterpretation: DD likely means '顶顶' (bump/upvote), not 'post here' | Mistranslation of 'DD'. In this context, 'DD' means 'Ding Dong' (bumping the thread) or expressing sympathy/support, not just 'posting'. |
| Qwen/Qwen2.5-14B-Instruct | Had only briefly encountered this before, but I've run into the same issue as the OP and can't help either. Guess I'll just DD. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | I had only briefly come across this before and also encountered the same issues as the poster, but was unable to offer any help, so I could only DDoS them. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Before, I only had a rough exposure to it, but I also encountered the same problem as the poster. I can't help, so I'll just DD it. |
严重
[准确性]
"DD it"
理由: 'DD' in this context likely means 'Ding Dong' (notify me) or 'Up' the thread, or potentially 'Direct Message'. Without context, 'DD' is ambiguous jargon that should be explained or translated to the intended action (e.g., 'bump it' or 'notify me'). | 'DD' is internet slang (likely 'ditto' or expressing sympathy), but leaving it as 'DD' without explanation is unclear. Should be translated as 'express sympathy' or similar | “只能DD一下了”多指“重新装系统 / 重新部署”等技术操作,“DD it”对一般读者不明且未解释,导致关键动作含义缺失 |
| Qwen/Qwen3.5-4B | Previously I only had a general acquaintance with it, but I encountered the same issue as the OP and couldn't do much about it. Can only DD. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | I've only had a rough contact with it before, but I encountered the same problem as the original poster. Helpless, can only DD. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Previously, I only had a rough acquaintance, but I also encountered the same problem as the original poster. Helpless, I could only DD now. |
严重
[准确性]
"I could only DD now."
理由: 'DD' in Chinese internet slang means to bump/support a post, not a generic action. Translation 'DD now' is unclear without explanation of the acronym meaning. | 此处“DD一下”是“顶帖/支持一下”的论坛用语,与“do DD / direct download”等不同。译文“could only DD now”语义不明且误解原本“只能帮顶一下”的意思。 | 'DD' is slang for 'Ding Ding' (upvote/bump) or 'Da Da' (hit/support). Leaving it as 'DD' without context or translating the action (e.g., 'give a bump') reduces clarity for non-insiders. | Incorrect translation of 'DD一下' (ding ding/knock to show interest). The meaning of 'DD' as a forum term is not conveyed. |
| google/gemma-3-12b-it | I've only had some vague exposure to this before, but I encountered the same problem as the original poster. I can't really help, so I just wanted to chime in. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Previously, I only had a basic contact with it, but I also encountered the same issue as the post-sharing topic, and I can only DD it. |
严重
[准确性]
"post-sharing topic"
理由: '爱莫能助' is a fixed phrase meaning 'I wish I could help but I can't'. '帖主' means 'thread starter/OP'. 'DD' is unclear - should be 'bump' or 'follow up'. Translation misses key meanings and is awkward. | Awkward and not accurate; should be 'thread starter' or 'OP', not 'post-sharing topic'. | Mistranslation of '帖主'. '帖主' means 'thread starter' or 'OP' (Original Poster). 'DD' is slang for 'ding' (bumping the thread) or 'private message', translated literally as 'DD' which is unintelligible to a general English audience without context. | 'DD' is internet slang meaning 'to bump/refresh a post' (like '顶顶'), not clear what 'DD it' means. '跟帖主' means 'the original poster', not 'post-sharing topic' |
| google/gemma-3-4b-it | I had only vaguely encountered this before, but I also ran into the same problem as the comment author, unable to help and could only DD (delete). |
严重
[准确性]
"could only DD (delete)"
理由: Incorrect translation: 'DD' in this context likely means '顶顶' (bump/push up the post) or '滴滴' (message/me), not 'delete'. In Chinese forum slang, DD is often used to mean 'bump' or 'send a message', not delete. | 圈内“DD一下”通常指“下单、订购、支持一单”,而非“delete”;此处理解为删除与原意相反。 | Mistranslation of 'DD'. In this context (forums/social media), 'DD' usually means 'Ding Ding' (bumping the thread) or 'Da Dian' (calling), or simply echoing the request. It rarely means 'delete'. Given the context '只能 DD 一下', it likely means 'I can only bump this thread' or 'I can only support/upvote'. |
| google/translategemma-12b-it | I've only had some basic exposure to this before, but I encountered the same problem as the original poster. I can't really help, so I just wanted to express my sympathy. | 通过 (无共识错误) |
| google/translategemma-4b-it | Restart |
致命
[准确性]
"Restart"
理由: Completely unrelated to source, which says the speaker only roughly used it before, met same problem, can’t help, and can only ‘DD’ (support/upvote); semantic content is lost. | The translation does not match the source text and is incorrect. | Complete hallucination. Source expresses inability to help and 'bumping' the thread (DD = ding/dig). Translation is unrelated. | Complete mistranslation. The source discusses encountering similar problems and can only 'DD' (bump/support). 'Restart' is entirely unrelated. | Completely incorrect translation: The source discusses having encountered the same problem as the original poster, being unable to help, and only being able to 'DD' (subscribe/follow). The hypothesis 'Restart' is completely unrelated and appears to be a button label or error. |
| tencent/HY-MT1.5-1.8B | Previously, I only had a general understanding of the issue, but I also encountered the same problem as the poster. I can't help but give some advice. |
严重
[准确性]
"I can't help but give some advice."
理由: DD is internet slang (顶/bump). 爱莫能助 means 'unable to help'. Should be 'I can't help, just bumping the thread' or similar. | Complete inversion of meaning. '爱莫能助' means 'willing to help but unable to do so'. The translation says the opposite. | “爱莫能助,只能DD一下了”是“爱莫能助,只能顶帖/表示支持”,而非“给建议”;含义被反转。 | Incorrect translation. 'DD一下' means 'to bump/dd (double click/follow)' - it's internet slang for showing support or following a post, not 'give some advice'. The translation adds content not in the original and misses the meaning of 'DD'. |
| tencent/HY-MT1.5-7B | I also only had a rough understanding before, but I encountered the same problem as the poster. I was unable to help and could only leave a comment. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Same as above, help top up. |
严重
[准确性]
"Same as above, help top up."
理由: Mistranslation of internet slang '帮顶' (bump/support) as 'top up' (recharge money). | Mistranslation: '帮顶' is internet slang meaning 'to bump a post' (supporting/upvoting), not 'help top up' | The term '帮顶' is slang meaning 'to help boost' or 'to support,' but the translation uses 'help top up,' which is slightly off. | “同上,帮顶。”是论坛用语,指“同楼上说的一样,帮忙把帖子顶上去”,与“top up(充值)”含义不同,整体语义未正确传达。 | Incorrect translation of '帮顶' (internet slang meaning 'help bump/support a post'). 'Help top up' is not the correct meaning. |
| CohereLabs/tiny-aya-water | Same as above, help top. |
严重
[准确性]
"Same as above, help top."
理由: “同上,帮顶。”是论坛用语,意思是“同上,帮忙顶一下(帖子)”。译文直译为“help top”不通顺,未体现“顶帖/ bump the thread”的含义。 | Incorrect translation of '帮顶' (internet slang meaning 'help bump/support the post'). 'Help top' is not idiomatic English and misses the meaning. | Mistranslation: '帮顶' means 'to bump a post' (supporting/bumping a thread), not 'help top' | Literal translation of internet slang '帮顶' (bump/support) results in nonsensical English. | The translation does not accurately convey the slang term '帮顶' which means to support or promote. |
| Qwen/Qwen2.5-14B-Instruct | Same as above, support. |
严重
[准确性]
"Same as above, support."
理由: Literal rendering misses the forum-slang meaning; it should convey ‘same as above’ + ‘bumping the thread’ rather than generic ‘support’. | Incorrect translation of '帮顶' which means 'bump' (to support a post by adding a reply), not 'support' | Incorrect translation direction. Source is Chinese, hypothesis is English. Should translate FROM Chinese TO target language, not the reverse. |
| Qwen/Qwen2.5-7B-Instruct | Same here, give a thumbs up. |
严重
[准确性]
"Same here, give a thumbs up."
理由: “同上,帮顶”是论坛语,意思是“同上,帮忙顶贴/顶一下”,与“点赞”不同;thumbs up 误译了“帮顶”的含义。 | The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). | Incorrect translation direction - source is Chinese to English but appears to be English output. '同上,帮顶' means 'same as above, please upvote/support' in forum context, not 'give a thumbs up' | The translation is slightly too literal and may not fully capture the slang nuance. |
| Qwen/Qwen3-14B | Same as above, help to top. |
严重
[准确性]
"help to top"
理由: Literal translation of '帮顶' fails to convey the forum slang meaning of 'bumping' a thread to increase visibility. | Incorrect translation of '帮顶' (internet slang meaning 'help bump up/support'). 'Help to top' is awkward and doesn't convey the intended meaning of supporting/bumping a post. | The translation does not accurately capture the slang meaning of '帮顶' which implies supporting or promoting something. | Incorrect translation of internet slang '帮顶' which means 'bump' (to support a post by replying), not 'help to top' |
| Qwen/Qwen3.5-4B | Same as above, upvote. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Same as above, bump. |
严重
[准确性]
"Same as above, bump."
理由: Source is Chinese slang about replying to bump a thread; target is a literal English forum phrase but target language should be zh and meaning is not preserved in Chinese. | The source text is Chinese internet slang meaning 'Ditto, bumping this thread.' The hypothesis translates it into English, failing to translate the content into the target language (zh). The output should be Chinese. | 帮顶 is Chinese forum slang meaning 'help boost/support a post' - 'bump' is not an accurate translation of this specific meaning in context |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Same as above, please support. |
严重
[准确性]
"Same as above, please support."
理由: The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. | “同上,帮顶”是论坛用语,含义为“同上,帮忙顶贴/顶一下”,应体现‘bump/boost the thread’含义而非一般性的支持。 | Mistranslation: '帮顶' means 'to bump a post' (support by upvoting), not 'please support' in general sense | Mistranslation of internet slang. '帮顶' means 'help bump/support the post' in forum context, not a general request for support. Should be 'Same as above, please bump the post' or similar. |
| google/gemma-3-12b-it | Same as above, bump it. |
严重
[准确性]
"Same as above, bump it."
理由: 帮顶 is internet slang meaning 'help boost/support a post' in forums, not 'bump it'. 'Bump it' is too casual and doesn't capture the supportive meaning. | Mismatch with target language: the required language is zh, but the hypothesis is in English. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed (Target Language should be zh). |
| google/gemma-3-1b-it | Same as above, top up. |
严重
[准确性]
"Same as above, top up."
理由: 原句为论坛回帖口语“同上,帮顶”,意为“同意上面的说法,帮忙顶帖/顶一下”,与“充值、加钱”的含义无关,整体语义误解。 | Incorrect translation of '帮顶' (meaning to bump/support a post) - translated as 'top up' which is unrelated | Mistranslation of slang '帮顶' (bump/support) as 'top up' (recharge/add funds). The source is Chinese, but the hypothesis is English, indicating a complete language mismatch. | Incorrect translation of '帮顶' as 'top up' instead of 'support or promote.' | Incorrect translation of '帮顶'. '帮顶' means 'help bump/support the post' (internet slang), not 'top up'. Should be 'help bump' or 'support'. |
| google/gemma-3-4b-it | Muokkaa |
致命
[准确性]
"Muokkaa"
理由: Completely unrelated to the Chinese source; source means roughly “same as above, bumping the post.” No semantic overlap. | The hypothesis is in Finnish ('Edit') while the target language is Chinese. The source meaning ('Same as above, bumping this up') is completely lost. | Hypothesis is in Finnish, not English translation of Chinese source. Complete mismatch with source content. | Complete mistranslation - source is Chinese internet slang meaning 'same as above, bump' but hypothesis is Finnish word meaning 'edit' | The translation does not match the source text and context. |
| google/translategemma-12b-it | Same as above, please support. |
严重
[准确性]
"Same as above, please support."
理由: Mistranslation: '帮顶' means 'to bump a post' (support by upvoting/replying to keep it visible), not 'please support' in a general sense | “帮顶”在论坛语境中是“帮忙把帖子顶上去”的意思,而不是泛指支持,应译为类似“bump this thread”以保留语用功能。 | Mistranslation of internet slang. '帮顶' means 'bump/upvote the post' not 'please support'. Should be 'Same as above, please bump' or 'Same as above, upvote please'. |
| google/translategemma-4b-it | OK |
致命
[准确性]
"OK"
理由: The translation completely fails to convey the meaning of the source text ('Same as above, bumping this post'). It appears to be a placeholder or unrelated text. | Translation is completely wrong. '同上,帮顶' means 'Same as above, bump' (bumping a post to the top). The hypothesis 'OK' is nonsensical and unrelated to the source. | The translation does not convey the meaning of the source text, which is a request for support. | Hypothesis does not translate source text. Source means 'Same as above, please support/bump the post' but hypothesis is just 'OK' | Completely untranslated; source means something like “same as above, bumping the thread.” |
| tencent/HY-MT1.5-1.8B | Translation of '同上,帮顶.' in en |
致命
[准确性]
"Translation of '同上,帮顶.' in en"
理由: The hypothesis is a meta-commentary describing the translation task instead of providing the actual translation into the target language (zh). | Hypothesis is not a translation but a placeholder indicating translation was attempted but not provided | Not a translation into the target language (zh); it's meta-text about translation and in English, failing to convey meaning of the source. | The hypothesis does not provide a translation but merely states that a translation is needed. | Hypothesis is not a translation but a meta-description. Should provide actual English translation of the Chinese source text. |
| tencent/HY-MT1.5-7B | "en": "Same as above, please help support.", "zh-TW": "與上述相同,請幫助支持。」, "ja": "同上のように、ご支援をお願いします。", "ko": "위와 동일하게, 도움을 주시기 바랍니다。", "de": "Gleich wie oben beschrieben, bitte unterstützen Sie uns.", "fr": "Comme décrit ci-dessus, veuillez nous aider à soutenir les efforts.", "it": "Allo stesso modo descritto sopra, vi preghiamo di aiutarci a sostenere questa attività."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Following the ideas provided by Lubo's posts, I also have an account pool... |
严重
[准确性]
"Lubo's"
理由: “佬友”是对论坛大佬/朋友的称呼,不是专有名词或人名“Lubo”,误解了称谓。 | Mistranslation of '佬友' (slang for experienced users/friends in a community) as a proper name 'Lubo'. | '佬友' is internet slang for 'fellow users/buddies', not a proper name 'Lubo'. Should be translated as 'fellow users' or similar. | Mistranslation: '佬友' is a slang term for 'buddy/friend' in a specific community, not a proper name 'Lubo'. '养号池' means 'maintaining an account pool' for specific purposes, not just 'have an account pool' |
| CohereLabs/tiny-aya-water | Following the ideas provided by Mr. Yu's posts, I also have an account pool... |
严重
[准确性]
"Mr. Yu"
理由: The slang term '佬友' refers to friends, not a specific person named Mr. Yu. | Mistranslation of '佬友' (a colloquial term for fellow users/experts in a community) as a specific name 'Mr. Yu'. | '佬友' is internet slang for 'buddy/friend', not 'Mr. Yu'. The translation misidentifies the term. | Mistranslation: '佬友' is slang for 'buddy/friend' in the community, not a proper name 'Mr. Yu' | “佬友”是对论坛/圈内大佬、朋友的俚语称呼,不是专有名“Mr. Yu”。将其误解为姓氏造成实质性误译。 |
| Qwen/Qwen2.5-14B-Instruct | Following the approach provided in my friend's post, I also maintained an account pool... |
严重
[准确性]
"my friend's"
理由: Mistranslation of '佬友' (lao you), which is community slang for 'fellow experts' or 'senior members', not literally 'my friend'. | Incorrect translation direction. Source is Chinese, hypothesis is English. Should translate FROM Chinese TO target language. | “佬友” is forum slang for fellow users/peers, not specifically ‘my friend’; relationship is overstated. |
| Qwen/Qwen2.5-7B-Instruct | Following the思路provided by the elder friend's post, I also raised an account pool... |
严重
[流畅性]
"the思路"
理由: Mixed script: English word '思路' retained in Chinese instead of being translated to 'thoughts' or 'approach' | 中英文混杂且无必要,应统一为一种语言表达。 | Mixed language - Chinese character '思路' left untranslated in English text
严重
[准确性]
"elder friend's post"
理由: '佬友' is internet slang for fellow users/friends, not specifically 'elder friend' | The hypothesis is in English instead of the target language (zh). Additionally, 'elder friend' is a mistranslation of the slang '佬友' (fellow enthusiast/expert), and '思路' was left untranslated. | “佬友”是对论坛里大佬/老哥的称呼,偏网络敬称,“elder friend”不自然且偏离原有语气,更接近“big bro”或“veteran”。 |
| Qwen/Qwen3-14B | Following the ideas provided in my friend's posts, I also raised an account pool... |
轻微
[准确性]
"my friend's posts"
理由: '佬友' is internet slang for 'fellow users/buddies' not specifically 'my friend'. The possessive 'my' is not present in source. | 佬友 refers to fellow forum users or peers, not necessarily the speaker’s personal friend. | The translation does not fully capture the slang nuance of '佬友' which implies a close-knit group of friends. | '佬友' is specific community slang for 'fellow experts/veterans', not just 'friend'. 'My friend' loses the nuance of peer expertise. |
| Qwen/Qwen3.5-4B | Following the ideas provided by fellow users, I also set up an account pool... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Following the ideas provided in Lao You's posts, I also raised an account pool... |
严重
[准确性]
"Lao You"
理由: 佬友 is internet slang for 'buddy/friend' not a proper name 'Lao You'; should be translated as 'buddy' or 'friend' | The hypothesis is in English, but the target language is Chinese. Additionally, 'Lao You' is a mistranslation of '佬友' (a slang term for fellow users/experts in the community), and 'raised an account pool' is an awkward literal translation of '养了个号池' (cultivated a pool of accounts). | “佬友” here means other forum members/peers, not a proper name; treating it as a name changes meaning. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Following the idea provided by your friend's post, they also maintained an account pool... |
严重
[准确性]
"your friend's post"
理由: Mistranslation: '佬友' is slang for senior/experienced user, not 'friend' | '佬友' is internet slang for 'fellow users/buddies', not 'your friend'. Should be 'Following the idea provided in a fellow user's post'. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. | “佬友”指论坛里的大佬/坛友,不是“your friend”;主语也应是“我”而非“they”,且“养了个号池”是自己维护账号池。 |
| google/gemma-3-12b-it | Following the ideas provided in the senior forum post, I also set up an account pool... | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "The thought process provided by the 'Follow League' post about the token pool...", "zh-TW": "跟着大佬的帖子中提供的思路,也是养了个号池...", "ja": "それに、その大佬の投稿で提供された思考プロセスは、また、アカウントプールを飼うことでもありました...", "ko": "그녀의 '다음 룩' 게시물에서 제공된 생각 방식은 또, 계정 풀을 키우는 것과 같았습니다...", "de": "Die Gedankenprozesse, die von der 'Follow League' Gruppe im Post über den Token Pool bereitgestellt wurden, sind auch ein Versuch, eine Zahlensammlung aufzubauen..." "fr": "Le processus de pensée fourni par le post du 'Suivre les loups' concernant le pool d'actifs...", "it": "Il processo di pensiero fornito dal post di 'Seguire i Lupi' riguardo al pool di asset..." } |
严重
[准确性]
"Follow League"
理由: Incorrect translation of '佬友' (slang for senior/experienced user) as 'Follow League', and '号池' (account pool) as 'token pool' - completely mistranslated the slang terms | Mistranslation of '佬友' (slang for experienced users/friends) as 'Follow League'. | Incorrect translation of '佬友' as 'Follow League' instead of 'elders' or 'seniors.' |
| google/gemma-3-4b-it | Following the idea provided in the post by the senior friend, we also set up an account pool... | 通过 (无共识错误) |
| google/translategemma-12b-it | Following the ideas provided in the post by experienced users, I also created an account pool... | 通过 (无共识错误) |
| google/translategemma-4b-it | Following the idea provided in the post by my friend, I also set up an account pool... |
轻微
[准确性]
"my friend"
理由: “佬友” is online slang for fellow forum users / buddies, not necessarily a personal friend. | 佬友 is internet slang for 'old friend/buddy' not specifically 'my friend'; context suggests it's a general reference to someone else's post | Mistranslation of '佬友' (a slang term for fellow community members/experts) as 'my friend'. The source implies a community context, not a personal friendship. |
| tencent/HY-MT1.5-1.8B | 按照老友在帖子里提到的方法,我也構建了個賬戶池... | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Following the ideas provided in the post by that friend, I also created an account pool... | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | *** is now open! |
轻微
[准确性]
"*** is now open!"
理由: The translation does not include '公益站' (public station), making the message unclear. | 原文“【***】公益站今天正式开放啦!”包含“公益站”(charity/public service site)和“今天正式开放”,译文省略“公益站”和“今天”信息,语义不完整。 | Missing translation of '公益站' (public welfare/charity site). The translation is incomplete and loses important context. | Omission of '公益站' (charity/public welfare station), losing specific information about the type of entity. |
| CohereLabs/tiny-aya-water | *** Free API Site is now open! |
严重
[准确性]
"*** Free API Site"
理由: Over-translation and hallucination; '公益站' means 'charity/public welfare station', not specifically 'Free API Site'. | The term '公益站' refers to a public service website, not an API site. | '公益站' means 'public welfare/charity site', not 'Free API Site'. The translation adds unwarranted specificity. | 原文“【***】公益站”是公益性站点(public-welfare / charity site),并非“Free API Site”。将“公益站”误译为“API Site”改变站点性质。 |
| Qwen/Qwen2.5-14B-Instruct | The free shared API site is officially open today! |
严重
[准确性]
"The free shared API site"
理由: Over-interpretation. '公益站' literally means 'public welfare site' or 'charity site'. While it might offer free APIs, adding 'shared API' is an assumption not present in the source text. | Incorrect translation direction. Source is Chinese, hypothesis is English. Should translate FROM Chinese TO target language. | Source only indicates a public/charity site; API and ‘free shared’ are additions not supported by the source. |
| Qwen/Qwen2.5-7B-Instruct | [***] free API public proxy today officially open! |
严重
[准确性]
"free API public proxy"
理由: Inaccurate translation: '公益站' (charity/free site) incorrectly rendered as 'free API public proxy' | '公益站' means 'public welfare/charity site', not 'free API public proxy'. Grammar also awkward. | The hypothesis is in English instead of the target language (zh). The term '公益站' (charity/public welfare station) is mistranslated as 'free API public proxy', adding specific technical details not present in the source. | 原文只有“公益站”即公益性质/公益站点,并未出现“API”或“proxy”等内容,属臆造信息。 |
| Qwen/Qwen3-14B | [***] The free shared API site is officially open today! |
严重
[准确性]
"The free shared API site"
理由: “公益站” means a public-welfare/charity site; “free shared API” adds specificity not present in the source and changes the meaning. | '公益站' means 'public welfare/charity site', not 'free shared API site'. The translation adds specificity not present in source. | '公益站' implies a non-profit or public welfare station. Translating it as 'shared API site' adds specific technical details ('API', 'shared') not present in the source and potentially misrepresents the nature of the station. |
| Qwen/Qwen3.5-4B | 【***】The free shared API site is now officially open! |
严重
[准确性]
"The free shared API site"
理由: 公益站 means 'public welfare/charity site', not specifically 'free shared API site'. The translation adds unwarranted specificity. | Source only says “公益站” (public-welfare / charity site) and does not specify ‘API’ or ‘shared API’. ‘公益’ implies non-profit/charitable, not specifically an API service. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| Qwen/Qwen3.5-9B | The free shared API site is officially open today! |
严重
[准确性]
"The free shared API site"
理由: “【***】公益站” is a public-benefit/charity site; adding “free shared API” is unjustified and changes meaning. | 公益站 means 'public welfare/charity site' not 'free shared API site'; the translation adds unwarranted specificity | The hypothesis is in English instead of the required target language (Chinese). It also hallucinates specific details ('API') not present in the source '公益站' (public welfare/charity station). |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Free API public proxy service has officially launched today! |
严重
[准确性]
"Free API public proxy service"
理由: The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. | 原句为“【***】公益站今天正式开放啦!”,仅说明某公益站点开放,没有提到“API”或“proxy service”,为臆造信息。 | Mistranslation: '公益站' means 'charity/public welfare site', not 'Free API public proxy service' | The translation adds 'API' and 'proxy service' which are not present in the source. | Over-interpretation. '公益站' means 'public welfare/charity site', not specifically 'Free API public proxy service'. The translation adds unwarranted specificity. |
| google/gemma-3-12b-it | 【***】The free API site is officially open today! |
严重
[准确性]
"【***】The free API site is officially open today!"
理由: Mismatch with target language: the required language is zh, but the hypothesis is in English. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. | The term '公益站' might not always refer to a free API site depending on context. |
| google/gemma-3-1b-it | 公益站 today officially opened! | 通过 (无共识错误) |
| google/gemma-3-4b-it | 【***】Free Shared API Sites are officially open today! |
严重
[准确性]
"Free Shared API Sites"
理由: Mistranslation of '公益站' - '公益' means 'charity/welfare' not 'free shared' | 公益站 means 'public welfare/charity site', not 'Free Shared API Sites'. Hypothesis adds unwarranted specificity not in source. | The hypothesis is in English, but the target language is Chinese. Additionally, '公益站' (charity/public welfare station) is mistranslated as 'Free Shared API Sites'. | The source just says “【***】公益站今天正式开放啦!” meaning a public-welfare/charity site. “Free Shared API Sites” adds “API” and plurality not present in the source and distorts the meaning. |
| google/translategemma-12b-it | 【***】The public API site is officially open today! |
严重
[准确性]
"public API site"
理由: Mistranslation: '公益站' refers to a 'charity/public welfare site' not 'public API site' | Mistranslation. '公益站' means 'public welfare/charity site' not 'public API site'. The term refers to a free service site. | “公益站”是公益性质的网站或站点,与“API”无关,加入“API”改变了原文含义。 | The term '公益站' translates to 'charity station' or 'non-profit site'. The hypothesis incorrectly translates it as 'public API site', changing the meaning entirely. |
| google/translategemma-4b-it | 【***】Free Shared API Site is officially open today! |
严重
[准确性]
"Free Shared API Site"
理由: Over-translation and hallucination. The source '公益站' means 'public welfare site' or 'charity site'. The hypothesis adds 'Shared API' which is not present in the source text. | 公益站 means 'public welfare/charity site' not 'Free Shared API Site'; the specific service type is not mentioned in source | “公益站” is a public-welfare / charity site; adding “API” is ungrounded and changes meaning. |
| tencent/HY-MT1.5-1.8B | Translation of '公益站今天正式开放啦!' in en |
致命
[准确性]
"Translation of '公益站今天正式开放啦!' in en"
理由: Meta-text describing a translation instead of translating the source into the target language; meaning not conveyed in zh. | The hypothesis is a meta-commentary describing the translation task instead of providing the actual translation into the target language (zh). | Hypothesis is not a translation but a placeholder indicating translation was attempted but not provided | The hypothesis does not provide a translation but merely states that a translation is needed. | Hypothesis is not a translation but a meta-description. Should provide actual English translation of the Chinese source text. |
| tencent/HY-MT1.5-7B | 【***】The public welfare site is officially open today! | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Deeply experienced the newly launched product, advice to avoid --- |
严重
[流畅性]
"Deeply experienced the newly launched product, advice to avoid ---"
理由: Awkward phrasing. Should be 'advice to avoid it' or 'recommendation to steer clear'. Also '建议避雷' is internet slang meaning 'recommend avoiding/warning about pitfalls'. | Unnatural collocation; 'experienced' is rarely used as a transitive verb in this context in English. | “Deeply experienced”不自然,应为“tried out”或“tested thoroughly”;“advice to avoid”应为“advise avoiding it”或“recommend avoiding it”,语法与搭配不地道。 |
| CohereLabs/tiny-aya-water | Deeply experienced the newly launched product, advice to avoid --- |
严重
[流畅性]
"advice to avoid"
理由: 语法错误,应为“advise to avoid it”或“my advice is to avoid it”。当前短语不符合英语自然表达。 | Grammatically incorrect; should be 'advise avoiding' or 'recommendation to avoid'. '避雷' is slang for 'avoid pitfalls/bad products'. | Incomplete translation: '建议避雷' means 'recommend to avoid it' (warning others not to use), but hypothesis omits what to avoid |
| Qwen/Qwen2.5-14B-Instruct | Thoroughly experienced the newly launched product, recommend avoiding it--- | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Deeply experienced the product just launched*, suggest avoiding--- |
轻微
[准确性]
"suggest avoiding---"
理由: “避雷”网络语含“避坑、防踩雷”之意,比单纯“avoid”更带警示语气,可译为“strongly recommend avoiding it”以保留语气。 | The hypothesis is in English instead of the target language (zh). The slang '避雷' (avoid pitfalls/scams) is translated literally as 'avoiding', losing the specific connotation of warning others against a bad product. | '建议避雷' means 'recommend avoiding/warning against', but context suggests this is incomplete translation of internet slang | The translation is correct but could be more natural. |
| Qwen/Qwen3-14B | Had a deep experience with the product just launched, recommend avoiding it--- | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | I deeply experienced the product just launched and recommend avoiding it. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Had an in-depth experience with the newly launched product and recommend avoiding it. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | After deeply experiencing the newly released product, it is advisable to avoid it. |
严重
[准确性]
"it is advisable to avoid it"
理由: '建议避雷' is internet slang meaning 'recommend avoiding/warning against', but the translation misses the colloquial tone. Should be 'recommend avoiding it' or 'heads up, avoid this'. | Incomplete translation: '避雷' means 'avoid getting scammed' or 'stay away from this trap', but the translation omits the negative connotation and the tone of warning | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-12b-it | I've had a deep experience with the product just *launched*, suggestion to avoid it. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Experienced a deep experience with the recently released product. |
严重
[准确性]
"Experienced a deep experience with the recently released product."
理由: 仅表述“深度体验了…产品”,遗漏后半句“建议避雷---”中“建议大家避免踩坑/不要用”的关键信息,导致语义严重不完整。 | Missing translation of '建议避雷' which means 'avoid this product.' | Missing the core meaning of '建议避雷' (suggest avoiding/warning against). The translation implies a neutral or positive experience, whereas the source is a warning. | Missing the second part '建议避雷' (suggesting to avoid it/steer clear) - significant meaning loss
轻微
[流畅性]
"Experienced a deep experience"
理由: “Experienced a deep experience”累赘不自然,可改为“Thoroughly tried out”或“Had a deep dive into”。 | Redundant phrasing 'deep experience' | Redundant phrasing 'experienced a deep experience'. Should be 'had an in-depth experience' or similar. | Redundant and unnatural phrasing. |
| google/gemma-3-4b-it | "en": "Deeply experienced a newly *launched product, recommend avoiding it—", "zh-TW": "深度體驗了一番剛推出\*的產品,建議避雷—", "ja": "新発売の製品を深く体験しましたが、避けることをお勧めします—", "ko": "새로 출시된 제품을 깊이 경험했지만, 피하는 것이 좋습니다—", "de": "Ich habe ein kürzlich *veröffentlichtes Produkt tief ausprobiert und rate davon ab—", "fr": "J'ai profondément testé un produit récemment *lancé et je vous conseille d'éviter—", "it": "Ho profondamente provato un prodotto recentemente *rilasciato e consiglio di evitarlo—" } <|im_end|> | 通过 (无共识错误) |
| google/translategemma-12b-it | I've had a chance to thoroughly experience the product that was just *released*, and I suggest avoiding it. | 通过 (无共识错误) |
| google/translategemma-4b-it | Download Now |
致命
[准确性]
"Download Now"
理由: Complete mistranslation. The source says '深度体验了一把刚刚*推出的产品,建议避雷' meaning 'Had a deep experience with the product just launched by *, recommend avoiding it'. The hypothesis 'Download Now' is completely unrelated to the source meaning. | The translation does not match the source text, which is a review of a product. | Hypothesis does not translate source. Source discusses experiencing a product and recommending to avoid it, but hypothesis is 'Download Now' | Complete hallucination. The translation bears no relation to the source text, which warns users to avoid a newly launched product ('suggest avoiding lightning/pitfalls'). | Unrelated to source; original says they deeply tried a newly launched product and recommend avoiding it. |
| tencent/HY-MT1.5-1.8B | Deep experience with the newly launched product, suggestions for avoiding pitfalls--- |
严重
[准确性]
"Deep experience with the newly launched product, suggestions for avoiding pitfalls---"
理由: The hypothesis is in English, but the target language is Chinese (zh). The text was not translated. | Incorrect translation: '深度体验' is slang meaning 'tried extensively/had deep experience' but should be understood in context of product review; '避雷' is slang meaning 'avoid scams/ripoffs' not literal 'pitfalls' | Misrepresents the evaluative stance; source says ‘strongly experienced the newly launched product, recommend avoiding it (negative),’ while hypothesis sounds neutral/informative, losing the warning/negative recommendation. | Missing subject and awkward phrasing. Should be 'I had a deep experience with the newly launched product' or similar. |
| tencent/HY-MT1.5-7B | I had a deep experience with the product that was just *launched*, so I suggest avoiding it --- | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | "en": "Can’t abuse this subscription today?", " " "zh-TW": "今天這個訂閱不能白50嗎?", " " "ja": "このサブスクリプションは今日利用できないのか?", " " "ko": "오늘 이 구독을 남용할 수 없나요?", " " "de": "Kann man diesen Abo heute nicht ausnutzen?", " " "fr": "Peut-on abuser de ce abonnement aujourd'hui ?", " " "it": "È possibile sfruttare questo abbonamento oggi?" } | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Can’t you abuse this subscription today? |
严重
[准确性]
"abuse this subscription"
理由: Mistranslation: '蹬' in this context is slang for 'to use/exploit' (as in taking advantage of a subscription), not 'abuse' | “蹬(灯)号/订阅”类俚语通常指“薅/薅羊毛/白嫖/薅额度”,即用光或薅免费权益,不是“辱骂/虐待”的abuse。含义偏差较大。 | The slang term '蹬' means to stop using, not to abuse. | Severe mistranslation of '蹬' (likely a typo for '登' meaning login, or slang for kick/use). 'Abuse' changes the meaning entirely and negatively. | '蹬' in this context is internet slang meaning 'to use/exploit', not 'abuse'. The translation fundamentally misrepresents the meaning and tone. |
| Qwen/Qwen2.5-14B-Instruct | Is this subscription not available for abuse today? |
严重
[准确性]
"for abuse"
理由: Incorrect translation of '蹬' which in this context likely means 'use' or 'access', not 'abuse' - this appears to be a misunderstanding of internet slang | Severe mistranslation of '蹬' (deng). In this specific proxy/subscription slang context, '蹬' likely means 'kick' (as in kicked out/disconnected) or 'use actively'. 'Abuse' is completely incorrect and changes the meaning negatively. | Incorrect translation direction. Source is Chinese, hypothesis is English. Should translate FROM Chinese TO target language. | “蹬订阅” in this context means cancel/unsubscribe, not ‘abuse’; the action is misinterpreted. |
| Qwen/Qwen2.5-7B-Instruct | This subscription, can't be abused today? |
严重
[准确性]
"can't be abused today"
理由: Wrong meaning: '蹬' (exploit/abuse) incorrectly translated as 'abused' - context suggests 'used' or 'cancelled' | The hypothesis is in English instead of the target language (zh). The slang '蹬' (likely meaning to kick/start/use aggressively or a typo for '登' login) is mistranslated as 'abused', which changes the meaning significantly. | '蹬' in this context is internet slang meaning 'to use/exploit', not 'abused'. Full meaning unclear without context. | “蹬订阅”是取消/退订之意,此处“不能蹬了吗”是“今天不能退/取消了吗”,不是“不能滥用”;语义误解。 |
| Qwen/Qwen3-14B | Can't abuse this *subscription today? |
严重
[准确性]
"Can't abuse this *subscription today?"
理由: “蹬” here likely means “use/ride” (slang like ‘hop on/ride a sub’), not “abuse”; “abuse” adds unjustified negative intent. | Incorrect translation of '蹬' - in this context '蹬' is internet slang meaning 'to cancel/terminate' (like canceling a subscription), not 'abuse' | The translation does not fully capture the slang nuance of '蹬' which implies using or abusing something. | '蹬' in this context is internet slang meaning 'exploit/abuse' but the translation is awkward. More importantly, the question mark placement and overall meaning is unclear - should convey 'can't exploit/use this subscription today?' | '蹬' in this context likely means 'use up', 'exhaust', or 'kick/start' depending on specific subculture slang, but 'abuse' carries a negative connotation of misuse that may not match the user's intent of simply utilizing the subscription fully. |
| Qwen/Qwen3.5-4B | Can't we abuse this subscription today? |
严重
[准确性]
"abuse this subscription"
理由: 蹬 in this context is internet slang meaning 'exploit/take advantage of', not 'abuse'. The translation is inaccurate and the meaning is unclear. | The source '蹬' is a typo/slang for '用' (use), not 'abuse'. The translation incorrectly interprets it as 'abuse' when it means 'use' in this context. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. Additionally, '蹬' (deng) likely means 'kick/start' or is a typo for '登' (login), whereas 'abuse' is an unjustified interpretation. | “蹬” here likely means ‘log in/use (the subscription)’ in slang, not ‘abuse’. The hypothesis overstates malicious usage not present in the source. |
| Qwen/Qwen3.5-9B | Can't I abuse this subscription today? |
严重
[准确性]
"abuse this subscription"
理由: Translation error: 蹬 (dēng) in context likely means 'to use' or 'to access', not 'abuse' | “蹬” here likely means to use/ride on (take advantage of) a subscription slot, not to abuse it; nuance is distorted. | The hypothesis is in English. Furthermore, '蹬' in this context likely means 'kick/start working' or is a typo for '登' (log in) or '用' (use), whereas 'abuse' implies malicious overuse which may not be the intended meaning of the slang. The primary error is the wrong language. | 蹬 in this context is internet slang meaning 'to exploit/take advantage of' not 'abuse'; 'Can't I exploit this subscription today?' is more accurate |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | This subscription, can't we stop it today? |
严重
[准确性]
"can't we stop it today?"
理由: '蹬' in this context is internet slang, likely meaning 'use/access'. The translation 'stop it' is incorrect. Should be something like 'can't we use it today?' or 'is it down today?' | Mistranslation: '蹬' is slang for 'cancel/unsubscribe', not 'stop' in general sense | “蹬订阅”是“取消/退订”的网络说法,句子是在问“今天不能退了吗?”,非“我们不能停止它吗?”这种带“we”的说法。 | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-12b-it | Can I no longer abuse this subscription today? |
严重
[准确性]
"Can I no longer abuse this subscription today?"
理由: The word '蹬' in the source is internet slang meaning 'cancel' or 'unsubscribe', not 'abuse' | Mismatch with target language: the required language is zh, but the hypothesis is in English. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-1b-it | "en": "Is this subscription not allowed to abuse?", "zh-TW": "这个订阅今天是否可以 Abuse?", "ja": "このサブスクリプションは今日、どのように Abuse を行うことができるのか?", "ko": "이 구독은 오늘 어떻게 Abuse를 이용할 수 있을까요?”, “이 구독은 오늘 어떻게 Abuse를 이용할 수 있을까요?” "de": "Ist diese Abonnements nicht erlaubt, um zu abusieren?", "fr": "Cette abonnements n'est pas autorisé pour abusser ?", "it": "Questo abbonamento non è permesso per abusare?" } |
严重
[准确性]
"Is this subscription not allowed to abuse?"
理由: Incorrect translation of '蹬了吗' as 'abuse' instead of 'can't be used.' | Incorrect translation of '蹬' (slang for using/exploiting) as 'abuse' - wrong interpretation of the slang term | 原句“今天不能蹬了吗?”是“今天不能薅/薅羊毛/白嫖了吗”的口语,含义为今天不能再用/薅这个订阅了,译文误解为“允许滥用”语义不符。 | Mistranslation of '蹬' (slang for kicking off/removing/unsubscribing) as 'abuse'. The context implies losing access, not abusing the service. |
| google/gemma-3-4b-it | This subscription, can’t I abuse/exploit/milk it today? |
严重
[准确性]
"abuse/exploit/milk"
理由: The hypothesis is in English, not Chinese. Furthermore, '蹬' (deng, slang for kicking off/leaving/canceling) is severely mistranslated as 'abuse/exploit/milk'. | “蹬” in this context is more like ‘use’/‘leverage’ (possibly in a gray/advantage-taking way), but the triple verb phrase over-specifies and introduces stronger connotations like “abuse” that may not always be implied. | 蹬 in this context means to 'take advantage of' or 'exploit', but the translation options are vague and imprecise. The question structure is also unclear in translation. |
| google/translategemma-12b-it | Is this subscription no longer exploitable today? |
严重
[准确性]
"no longer exploitable"
理由: 这里的“蹬”更可能是“登录/登上去”或“续费/使用”的谐音或错别字,用“exploitable”引入利用漏洞的含义,属于擅加,改变原意。 | Mistranslation: '蹬' in this context is slang for 'cancelling' or 'terminating' a subscription, not 'exploiting' | The slang '蹬' in this context likely means 'to kick' (as in kicked out/banned) or 'to work/function'. 'Exploitable' implies finding a security loophole, which does not match the likely meaning of the service being down or the user being removed. | Mistranslation of '蹬'. In this context, '蹬' is internet slang meaning 'to use/exploit (a service)'. The translation 'exploitable' is awkward and unclear. Should be 'Can't use this subscription today?' or 'Is this subscription down today?' |
| google/translategemma-4b-it | Can I not 'abuse' this subscription today? |
严重
[准确性]
"Can I not 'abuse' this subscription today?"
理由: “蹬订阅” here is likely slang for using up / riding on a subscription; the notion of “abuse” adds strong negative meaning not present in source. | 蹬 in this context means 'to use excessively/exploit' not 'abuse'; the translation is awkward and unclear | Mistranslation of the slang '蹬' (dēng). In this context, it likely means 'to kick/start working' or 'to use actively', whereas 'abuse' implies malicious overuse, changing the nuance significantly. | Incorrect translation. The source '蹬' is internet slang meaning 'to use/exploit' (as in taking advantage of a subscription). The hypothesis incorrectly translates it as 'abuse' which has a negative connotation, whereas the slang usage here is more neutral meaning 'to use' or 'to enjoy'. |
| tencent/HY-MT1.5-1.8B | This *subscription, can't pedal today, right? |
严重
[准确性]
"can't pedal today, right?"
理由: “蹬” here is slang for cancel/unsubscribe, not ‘pedal’; the hypothesis is semantically incorrect. | '蹬' in this context is internet slang meaning 'to use/access', not 'pedal'. The translation is completely incorrect. | The term '蹬' is incorrectly translated as 'pedal' instead of 'log in' or 'access'. | The hypothesis is in English, but the target language is Chinese (zh). Additionally, '蹬' (slang for working/running) was mistranslated literally as 'pedal'. | Incorrect translation: '蹬' is internet slang meaning 'cancel/terminate' (from '蹬掉'), not literal pedaling |
| tencent/HY-MT1.5-7B | Can't we abuse this subscription today? |
严重
[准确性]
"Can't we abuse this subscription today?"
理由: Incorrect translation: '蹬' (to ride/pedal) was mistranslated as 'abuse' instead of its slang meaning related to using/exploiting a subscription | '蹬' in this context is internet slang meaning 'exploit/abuse a loophole', but the translation 'abuse' is too crude and doesn't capture the specific meaning. Also, the overall sentence structure and meaning are unclear in translation. | Severe mistranslation of the slang '蹬' (likely a typo for '登' meaning log in, or specific slang for using/exploiting in a specific context). 'Abuse' implies malicious misuse which may not be the intended meaning of the colloquial '蹬' (often meaning 'to use' or 'to get on board' in specific dialects/slang, or simply a typo for '登' - log in). Without context, 'abuse' is a high-risk hallucination of intent. If '蹬' means 'log in', this is critical. If it means 'use up', 'abuse' is still too strong. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Team self-boarding |
严重
[准确性]
"Team self-boarding"
理由: Mistranslation: '上车' is internet slang meaning 'to join/participate in something (like a deal/activity)', not literal 'boarding'. 'team自助上车' means 'team self-service join' or 'team members can join themselves' | “team自助上车”在语境中多指“团队自助加入/参与(项目/车)”,而“self-boarding”易被理解为登机或登车行为,未体现“加入项目/团”的含义。 | Literal translation of '上车' (get on the bus/join the group deal) as 'boarding', losing the slang meaning of joining a shared service or deal. |
| CohereLabs/tiny-aya-water | Team self-upgrade |
严重
[准确性]
"Team self-upgrade"
理由: Mistranslation of '上车' (get on the bus/join the group/buy in) as 'upgrade'. The phrase means joining a team subscription. | '自助上车' means 'self-service boarding/joining', not 'self-upgrade'. The translation misses the meaning entirely. | The slang term '上车' means to join or get on, not to upgrade. | “自助上车”指“自行加入/参与某项目或车队”,不是“自助升级”。含义明显错误。 | Mistranslation: '上车' means 'to join/get on board' (a group buy), 'team自助上车' means 'team self-service to join', not 'Team self-upgrade' |
| Qwen/Qwen2.5-14B-Instruct | team self-service boarding. |
严重
[流畅性]
"team self-service boarding."
理由: Literal translation of '上车' (get on the bus) which is slang for 'joining a group buy' or 'subscribing to a service'. 'Boarding' makes no sense in this context without cultural adaptation. | Literal translation that doesn't convey the meaning; '上车' in internet slang means 'join/participate (in a group buy)', not literal boarding | Unnatural English; unclear meaning. Needs a more idiomatic phrasing like ‘self-service team onboarding’ or similar depending on context. |
| Qwen/Qwen2.5-7B-Instruct | team self-boarding. |
严重
[准确性]
"team self-boarding."
理由: Incorrect translation: '上车' (join/purchase) should be 'join' or 'sign up', not 'boarding' | “自助上车”常指自行加入/开通使用(如自助加入某服务/计划),不是“自助登车”;“self-boarding”过于字面且不明所指。 | '自助上车' is internet slang meaning 'self-service join/participate', not 'self-boarding'. Awkward phrasing. | The hypothesis is in English instead of the target language (zh). '上车' is slang for joining a service or group, not literally 'boarding' a vehicle. |
| Qwen/Qwen3-14B | team self-service boarding. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Team self-service boarding. |
严重
[准确性]
"Team self-service boarding."
理由: The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. | 上车 is internet slang meaning 'join in/participate', not literal 'boarding'. The translation is too literal and misses the idiomatic meaning. | '上车' is internet slang meaning 'join/get access to' (like joining a group buy), not literal boarding. The translation is too literal and misses the slang meaning. | “上车” in this context is slang for ‘join (a group/project/service)’ rather than literal boarding a vehicle. “Team self-service boarding” is unnatural and fails to convey ‘team can join on their own’. |
| Qwen/Qwen3.5-9B | Self-service shared subscription slot. |
严重
[准确性]
"Self-service shared subscription slot."
理由: “team自助上车” refers to team self-service joining/boarding (e.g., join a team plan); “shared subscription slot” is an incorrect, overly specific interpretation. | 上车 is internet slang meaning 'to join/get on board' - the translation should be 'Self-service group purchase' or 'Self-service shared account' | The hypothesis is in English. '上车' is slang for 'joining a group buy/shared subscription,' which is interpreted correctly in meaning but the output is in the wrong language. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | team self-check-in |
严重
[准确性]
"team self-check-in"
理由: Mistranslation: '上车' is slang meaning 'to join/purchase (a group buy)', not 'check-in' | 这里的“自助上车”是自行加入/开通服务的隐喻,通常与账号/服务购买相关,不是机场或酒店的“self-check-in”。 | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-12b-it | team self-service boarding. | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "Team self-ride", "zh-TW": "Team 自助上车", "ja": "チーム自乗", "ko": "팀 자가 탑승", "de": "Team selbstfahren", "fr": "Équipe se})\n\n<|im_end|> |
严重
[准确性]
"self-ride"
理由: Incorrect translation of '上车' (slang meaning to join/participate in a group buy) as 'ride' - wrong interpretation of the slang term | Mistranslation of '上车' (slang for joining a group buy or service) as 'self-ride'. | Incorrect translation of '自助上车' as 'self-ride' instead of 'self-service boarding.' |
| google/gemma-3-4b-it | Team members can self-board. |
严重
[准确性]
"Team members can self-board."
理由: “自助上车” in online context means self-service joining/using a service/project, not literal boarding. Translation is confusing and not aligned with the idiomatic meaning. | The hypothesis is in English, not Chinese. '上车' (get on the bus) is slang for joining a service/group, which 'self-board' captures literally but awkwardly, yet the primary error is the wrong language. | Mistranslation of '上车' - internet slang meaning 'to join/get access to' not literal 'board' |
| google/translategemma-12b-it | Team self-service boarding. |
严重
[准确性]
"Team self-service boarding."
理由: “上车”在网络语境中指“加入/参与某个项目或车”,不是实体的“boarding”;“team自助上车”应为“team can join by themselves”等,当前译文基本不可理解且误导。 | Mistranslation: '上车' is slang meaning 'to join/purchase' (like getting on a bus), not literal boarding. 'Team自助上车' means 'team self-service purchase/join' | Mistranslation of internet slang. '上车' means 'join/get on board (a group activity)' not 'boarding'. Should be 'Team self-service sign-up' or 'Team self-service join'. |
| google/translategemma-4b-it | Team self-boarding. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Team Self-Parking |
轻微
[准确性]
"Team Self-Parking"
理由: '自助上车' means 'self-service boarding/joining', not 'self-parking'. The translation is completely incorrect. | Incorrect translation: '上车' is internet slang meaning 'join/participate' (like getting on a train), not literal parking. Should be 'Team self-join' or similar | “上车” in this context means to join/get onboard (e.g., join a deal), not literal ‘self-parking’; key meaning is wrong. | The hypothesis is in English, but the target language is Chinese (zh). '上车' (slang for joining a service/team) was mistranslated literally as 'Self-Parking'. |
| tencent/HY-MT1.5-7B | Team self-service boarding. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | La pagina che hai richiesto non esiste. |
致命
[准确性]
"La pagina che hai richiesto non esiste."
理由: The translation is in Italian and does not match the source content. | Completely wrong output: The hypothesis is in Italian and means 'The page you requested does not exist' - this is not a translation of the Chinese source at all. The source says '无限续杯项目正式失效,被官网封杀了' meaning 'The unlimited refill project is now officially invalid, blocked by the official website' | Hallucination/Wrong language: The output is Italian ('The page you requested does not exist') instead of Chinese, and the content is unrelated to the source text about a project being banned. | Completely wrong translation. Response is in Italian instead of English. Should translate the Chinese source about an unlimited refill project being discontinued. | 源文是关于“【**】无限续杯项目正式失效,被官网封杀了”的状态说明,译文为意大利语“你请求的页面不存在”,内容完全无关。 |
| CohereLabs/tiny-aya-water | Please enter your email address. |
致命
[准确性]
"Please enter your email address."
理由: 与原文“【**】无限续杯项目正式失效,被官网封杀了。”完全无关;原文讲的是“无限续杯项目失效并被官网封禁”,译文内容完全不同,属于严重内容替换。 | Completely unrelated translation: Source discusses '无限续杯项目' (unlimited refill project) being blocked, hypothesis is completely different content | Complete mistranslation. The source discusses a project being discontinued, not requesting email input. This is a critical accuracy failure. | Complete hallucination; the translation bears no relation to the source text which discusses a project being banned. | The translation is unrelated to the source text. |
| Qwen/Qwen2.5-14B-Instruct | The unlimited refill project has officially become invalid and was blocked by the official website. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | 【**】Infinite Refill Project is officially invalid and banned by the official website. |
严重
[准确性]
"Infinite Refill Project"
理由: Inaccurate: '无限续杯' (unlimited refill) is internet slang for subscription/service, not literal 'project' | “无限续杯项目”更偏向“无限续杯方案/活动/项目”,通常语境是服务/活动,直译为“Project”略显生硬,可用“unlimited refills program/scheme”。 | The hypothesis is in English instead of the target language (zh). '无限续杯' (infinite refill) is a metaphor for unlimited usage, which is translated too literally. |
| Qwen/Qwen3-14B | **The infinite refilling project has officially expired and was banned by the official website. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | The Unlimited Refill Project has officially expired and been banned by the official website. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | **Infinite Refill Project officially invalidated and banned from the official website. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The 'Unlimited Refill Project' has officially become invalid and has been banned by the official website. | 通过 (无共识错误) |
| google/gemma-3-12b-it | The [**] Unlimited Refill Cup program has officially expired and been shut down by the official website. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The Unlimited Refresh Project has officially expired on the website. |
严重
[准确性]
"has officially expired on the website"
理由: Mistranslation of '被官网封杀了' (banned/blocked by the official website) as 'expired on the website'. '封杀' implies an active ban, not natural expiration. | 原文“被官网封杀了”是“被官方/官网封禁、封杀”,并非“在网站上到期失效”,译文将被封禁误解为自然到期,含义错误。 | Incorrect translation of '无限续杯' (slang for unlimited renewal/refill) as 'Unlimited Refresh Project' and '被官网封杀' (banned by official website) as 'expired on the website' - completely wrong interpretation | Inaccurate translation. '被官网封杀' means 'was killed/banned by the official website', not 'expired on the website'. Should convey that the project was officially shut down/banned. |
| google/gemma-3-4b-it | The ‘Infinite Cup’ project has officially expired and been banned by the official website. | 通过 (无共识错误) |
| google/translategemma-12b-it | The "unlimited refill cup" program has officially been discontinued and removed from the official website. | 通过 (无共识错误) |
| google/translategemma-4b-it | "en": "The \"Unlimited Cup\" project has officially been terminated and banned by the official website.", "zh-TW": "「無限杯」專案已正式終止並遭到官方網站封殺。", "ja": 「無限杯」プロジェクトは、公式ウェブサイトによって正式に終了し、禁止されました。", "ko": "『무한컵』 프로젝트는 공식 웹사이트에서 공식적으로 종료 및 금지되었습니다.", "de": "Das Projekt \"Unbegrenzte Tasse\" wurde offiziell beendet und von der offiziellen Website verboten.", "fr": "Le projet « Tasse illimitée » a été officiellement annulé et interdit par le site officiel.", "it": "Il progetto « Tazza illimitata » è stato ufficialmente terminato e bloccato dal sito ufficiale." } <|im_start|> | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | The unlimited refill program has officially expired and was banned by the official website. |
严重
[准确性]
"expired"
理由: The hypothesis is in English, but the target language is Chinese (zh). | “正式失效” is closer to ‘officially became invalid/ceased to work’ rather than ‘expired’ from natural causes; nuance of being made invalid is slightly lost. | '失效' means 'became invalid/stopped working', not 'expired'. The nuance is different. |
| tencent/HY-MT1.5-7B | The [**] unlimited refill project has been officially discontinued and blocked by the official website. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Is there still any place for using big banana now? |
严重
[准确性]
"big banana"
理由: Literal translation of '大香蕉', which is likely slang or a codename. Without context, a literal translation often fails to convey the intended referent (e.g., a specific tool or service), appearing nonsensical. | “大香蕉”在此应是某个工具或服务的昵称/代称,直译为“big banana”变成字面“大香蕉”,未传达其为产品/服务代称的含义。 | Mistranslation: '大香蕉' is likely slang for a specific platform/service, not literal 'big banana'. The translation is too literal and loses the actual meaning | '大香蕉' is internet slang/code name for a service/product, not literally 'big banana'. Translation is too literal and loses meaning. |
| CohereLabs/tiny-aya-water | Is there still any place for using big banana now? |
严重
[准确性]
"big banana"
理由: '大香蕉' is internet slang for a specific service/product, not literally 'big banana'. The translation is too literal and misses the reference. | “大香蕉”是代称/暗指某服务或产品的绰号,并非字面“big banana”。直译导致严重误解。 | Literal translation of slang '大香蕉' (likely referring to a specific tool, VPN, or service) loses the intended referent. |
| Qwen/Qwen2.5-14B-Instruct | Are there still places where people are using big bananas? |
严重
[准确性]
"big bananas"
理由: The slang '大香蕉' refers to Google, not actual bananas. The translation does not capture this metaphor. | Treats “大香蕉” literally; in slang this likely refers to a specific tool/service or entity, not actual bananas. | Incorrect translation direction. Source is Chinese, hypothesis is English. Should translate FROM Chinese TO target language. | Failure to translate slang/code. '大香蕉' (Big Banana) is almost certainly a codename for a specific service, tool, or proxy (likely related to 'Banana' VPN or similar). Translating it literally destroys the meaning. |
| Qwen/Qwen2.5-7B-Instruct | Is there still anyone using big bananas? |
严重
[准确性]
"big bananas"
理由: '大香蕉' is internet slang/code word, not literal 'big bananas'. Translation misses the intended meaning. | The hypothesis is in English instead of the target language (zh). '大香蕉' is likely a codename or slang for a specific service/tool, translated literally loses the reference. '佬友们' (friends/experts) is omitted. | “大香蕉”是特定代称/隐语(多为某服务/工具/站点),直译成“大香蕉”会被理解为字面水果,丢失原隐喻和指代。 | Wrong translation: '大香蕉' is likely slang for a platform/service, not literal bananas |
| Qwen/Qwen3-14B | Are there still places where friends use big bananas? |
严重
[准确性]
"friends use big bananas"
理由: The translation does not capture the slang meaning of '大香蕉' which refers to a specific service or product, not the literal fruit. | “大香蕉” here is likely a slang/name for a service or product, not literal bananas; literal rendering misinterprets the slang. | '佬友们' refers to 'fellow veterans/experts' in the community. 'Friends' is too generic and loses the specific community bond implied. | '大香蕉' is internet slang (likely referring to a specific service/tool) and should not be translated literally as 'big bananas'. The meaning is completely lost. |
| Qwen/Qwen3.5-4B | Do you still use Big Banana anywhere now? |
严重
[准确性]
"Big Banana"
理由: 大香蕉 is internet slang for a specific service/platform, not literal 'Big Banana'. The translation completely misses the meaning. | “大香蕉” here is a slang/proper-noun nickname (likely for a specific service/product), not literally ‘Big Banana’. Literal rendering loses the intended referent. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| Qwen/Qwen3.5-9B | Do any of you still use Big Banana? |
严重
[准确性]
"Big Banana"
理由: The hypothesis is in English. '大香蕉' (Big Banana) is likely a codename or slang for a specific service (possibly Google or a specific tool), but the translation fails to provide the Chinese output required. | “大香蕉” is likely a slang name for a specific tool/service; literal “Big Banana” fails to convey referent and sounds like an actual banana. | 大香蕉 appears to be a slang term for a specific service or platform; translation as 'Big Banana' is too literal and loses meaning | 大香蕉 is internet slang for a specific service/product, not literally 'Big Banana'; proper translation requires understanding the reference |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | "en": "Are there still any places using Big Banana among friends?", "zh-TW": "佬友们现在还有使用大香蕉的地方吗?", "ja": "友達們現在還有使用的大香蕉的地方嗎?", "ko": "친구들 사이에서 아직도 '대바나나'를 사용하고 있는 곳이 있나요?", "de": "Gibt es immer noch Plätze, die 'Große Banane' verwenden?", "fr": "Y a-t-il encore des endroits où l'on utilise 'Gros Bananas'?", "it": "Mentre ci sono ancora posti che utilizzano il 'Grande Banana'?'} <|im_end|> |
轻微
[准确性]
"Big Banana"
理由: The hypothesis contains raw JSON data with multiple languages instead of a coherent Chinese translation. It fails to translate the source and includes irrelevant content. | “大香蕉”在语境中可能是产品/服务的绰号,直译为“Big Banana”虽字面正确但未体现其代称性质,存在轻微含义偏差。 | Hypothesis contains malformed output with multiple language translations mixed together instead of a single English translation. The format is broken and unusable. | Mistranslation: '大香蕉' is likely slang for a specific service or tool, not literal 'Big Banana'. Also contains extra multilingual content not in source. |
| google/gemma-3-12b-it | Are there still places using big bananas? |
严重
[准确性]
"Are there still places using big bananas?"
理由: The slang term '大香蕉' refers to a specific service and not actual bananas. | Mismatch with target language: the required language is zh, but the hypothesis is in English. | The term '大香蕉' is internet slang for a specific service or platform, not literal bananas. The translation fails to convey the intended meaning. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-1b-it | "en": "Are there still places using big bananas?", "zh-TW": "你们现在还有使用大香蕉的地方吗?", "ja": "今ここで大香蕉の場所はまだありますか?", "ko": "지금 대바나나를 사용하는 곳이 아직 있습니까?”, "de": "Gibt es noch Orte, wo man große Bananen verwendet?", "fr": "Y a-t-il encore des endroits où l'on utilise des bananes grandes ?", "it": "Ci sono ancora posti dove si usano banane grandi?" } |
严重
[准确性]
"Are there still places using big bananas?"
理由: Literal translation of '大香蕉' which is likely slang/code for a specific service or tool, losing the intended referent. | Incorrect translation of '大香蕉' (slang for a specific service/platform) as literal 'big bananas' - fails to recognize the slang term | “大香蕉”在语境中很可能是某工具/服务绰号,不是字面“大香蕉”,按字面译为“big bananas”丢失真实指代,易误解。 | Incorrect translation of '大香蕉' as literal bananas instead of a specific service. |
| google/gemma-3-4b-it | Are there still places where users are using Big Banana? |
严重
[准确性]
"users are using Big Banana"
理由: “佬友们” are the forum peers themselves, not “users” in general, and “大香蕉” is likely slang for a specific tool/service; literal “Big Banana” without context misleads. | The hypothesis is in English, not Chinese. '大香蕉' (Big Banana) is likely slang for a specific tool or service (e.g., VPN), and the translation fails to provide the Chinese text. | 大香蕉 is likely a code name or reference to a specific service/product, not literally 'Big Banana'. Translation fails to recognize this as a proper noun/slang term. |
| google/translategemma-12b-it | Are there still places where people use the 'big banana'? | 通过 (无共识错误) |
| google/translategemma-4b-it | Are there still places where users can use the big banana? |
严重
[准确性]
"users can use the big banana"
理由: Literal translation of the slang '大香蕉'. In this context, it is almost certainly a euphemism for a specific tool or service (often related to network tools), and translating it literally loses the intended meaning. | Incorrect translation. '大香蕉' is likely internet slang or a euphemism for something else (possibly a service or tool). The hypothesis translates it literally as 'big banana' which is incorrect. | “大香蕉” is likely slang/code name for a specific service or tool; literal “big banana” is misleading and unnatural in context. |
| tencent/HY-MT1.5-1.8B | Is there any place for big bananas still available for folks who use them? |
致命
[准确性]
"big bananas"
理由: “大香蕉” is slang (likely for a tool/service), not literal bananas; literal rendering distorts meaning. | The hypothesis is in English, but the target language is Chinese (zh). '大香蕉' is likely slang that was translated literally. | '大香蕉' is internet slang (likely referring to a specific service/product), not literal 'big bananas'. The translation completely misses the meaning. |
| tencent/HY-MT1.5-7B | Do any of you still use big bananas nowadays? |
致命
[准确性]
"big bananas"
理由: '大香蕉' is internet slang/code name for a specific service/product, not literal 'big bananas'. The translation completely misses the meaning and context. | Failure to translate slang/metaphor. '大香蕉' is likely a codename for a specific tool, service, or concept (e.g., a specific proxy or software shape/icon). Translating it literally as 'big bananas' renders the sentence nonsensical. | Incorrect translation: '大香蕉' is likely internet slang (possibly referring to a platform or service), not literal bananas |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Actually, I usually check some information, the credit limit of PING is already enough. |
严重
[准确性]
"the credit limit of PING"
理由: Mistranslation: '普号' refers to 'regular account' (普通号), not 'PING'. The translation incorrectly interprets '普号' as a proper name 'PING' | “普号”指普通账号或普通版账户,与“PING”无关;“额度”在此是使用配额/用量限制,而非金融意义上的“credit limit”,整体误解术语。 | Mistranslation of '普号' (regular/standard account) as 'PING', which appears to be a hallucination or severe error. | '普号' means 'regular/standard account', not 'PING'. The translation misinterprets the abbreviation. |
| CohereLabs/tiny-aya-water | Actually, I usually check some information, the credit limit of PING is already enough. |
严重
[准确性]
"PING"
理由: Mistranslation of '普号' (standard/regular account) as 'PING'. | “普号”指“普通账号/普通档账户”,不是“PING”。误将“普”联想到“ping”属于明显误译。 | '普号' means 'regular/standard account', not 'PING'. The translation incorrectly converts a Chinese term into an acronym. | Mistranslation: '普号' means 'regular account' (ordinary tier), not 'PING' which is a brand name |
| Qwen/Qwen2.5-14B-Instruct | In fact, checking information regularly, the standard account's limit is already sufficient. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Actually,平时 checking resources regularly is enough for普通号码. |
严重
[准确性]
"普通号码"
理由: “普号”的常见语境是“普通账号”而非“普通电话号码”,直译成“number”易引起理解偏差。 | '普号' means 'regular/standard account', not 'ordinary number code' | The hypothesis is a mix of English and untranslated Chinese characters ('平时', '普通号码'), failing to produce a coherent target language (zh) sentence. The source was already Chinese; the output should be Chinese, not a broken translation. |
| Qwen/Qwen3-14B | Actually, checking information on a regular basis, the general credit limit is already completely sufficient. |
严重
[准确性]
"the general credit limit"
理由: '普号的额度' refers to 'regular/free account quota/allowance', not 'general credit limit'. The terminology is inaccurate. | '普号' means 'standard account' or 'regular account'. 'Credit limit' implies a financial constraint, whereas the source refers to the quota/limits of a standard tier account. | “普号的额度” is quota/limit of a standard account, not necessarily a financial ‘credit limit’; “regular account quota” would be more accurate. |
| Qwen/Qwen3.5-4B | Actually, checking up on materials regularly is enough to cover the quota for general numbers. |
严重
[准确性]
"general numbers"
理由: 普号 means 'regular/free account', and 额度 means 'quota/limit'. The translation is confusing and doesn't clearly convey that a free account's quota is sufficient for regular use. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. | “普号” means ‘regular/standard account’, not ‘general numbers’. ‘Numbers’ misinterprets ‘号’ and obscures that it refers to account type. |
| Qwen/Qwen3.5-9B | Actually, for regular research, the quota for standard accounts is already sufficient. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | In fact, when you check the resources regularly, the quota for the general number is already sufficient. | 通过 (无共识错误) |
| google/gemma-3-12b-it | In practice, checking some information regularly is usually sufficient with a standard account. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Actually, I've been checking documents lately, and the limit for the account is completely used up. |
严重
[准确性]
"the limit for the account is completely used up"
理由: Incorrect meaning. '额度已经完全够用' means 'the quota/limit is completely sufficient', not 'used up'. Should be 'the quota is completely adequate'. | 原文“额度已经完全够用了”是额度足够用,并非“已经用完”,译文方向相反。 | Incorrect translation - '额度完全够用了' means 'the limit is fully sufficient' not 'used up' - opposite meaning | Contradicts the source. Source '完全够用了' means 'completely sufficient', while hypothesis 'completely used up' implies exhaustion/insufficiency. |
| google/gemma-3-4b-it | Actually, for regular research, the standard quota is already completely sufficient. | 通过 (无共识错误) |
| google/translategemma-12b-it | In fact, by simply checking some information regularly, the credit limit for a general card is usually sufficient. |
严重
[准确性]
"credit limit for a general card"
理由: Mistranslation. '普号' refers to a regular/free account, not 'general card'. '额度' in this context means quota/limit for a service, not credit card limit. Should be 'the quota for a regular account'. | “查查资料”是“查资料/搜信息”,与金融用语“credit limit”无关;“普号”是普通账号,不是“general card”,整体将场景误译成信用卡额度。 | '普号' refers to a 'standard account' or 'regular account', not a 'general card'. The translation introduces an incorrect entity (card vs account). |
| google/translategemma-4b-it | In fact, for everyday research, the standard amount is already sufficient. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | In fact, if you check some information usually, the limit for the common account is already sufficient enough. | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | In fact, when looking up information regularly, the quota of a regular account is already more than enough. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The suspension measures by A Co. are a bit too harsh. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | The suspension measures by A Company recently were quite severe. | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Company A has been suspending accounts quite heavily lately. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | A社 recently imposed quite strict account bans. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Company A has been cracking down on account suspensions recently. |
严重
[准确性]
"cracking down on account suspensions"
理由: '封号力度' means 'intensity/severity of account bans', not 'cracking down on account suspensions'. The phrasing is awkward and inaccurate. | Logical error. One cracks down on 'violations' or 'users', resulting in suspensions. You don't crack down on the suspensions themselves. Should be 'cracking down on accounts' or 'suspending accounts aggressively'. | “封号力度有点大” means the platform is suspending/banning more aggressively; “cracking down on account suspensions” reverses the relationship. Should be “cracking down by suspending accounts” or “banning a lot of accounts.” |
| Qwen/Qwen3.5-4B | A Company has been banning accounts quite heavily recently. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | A Corp's recent ban enforcement is quite harsh. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | A's recent ban rate has been quite high lately. |
严重
[准确性]
"A's recent ban rate"
理由: Mistranslation: '封号力度' means 'intensity of account banning', not 'ban rate' | '封号力度' means 'intensity/severity of account bans', not 'ban rate'. Should be 'A has been banning accounts quite aggressively lately'. | “A社”应为“A company / Company A”,而非“A’s”;“封号力度有点大”指封号很严厉或频率高,可用“crackdown on accounts is quite strong”,当前译文表达不够自然且主语略有误导。 | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-12b-it | Company A has been banning accounts quite aggressively recently. | 通过 (无共识错误) |
| google/gemma-3-1b-it | A company recently received a strong ban, and the situation is quite serious. |
严重
[准确性]
"A company recently received a strong ban, and the situation is quite serious."
理由: Incorrect interpretation. '封号力度有点大' means 'the intensity/severity of account banning is quite high', not 'received a strong ban'. Should convey that the company is banning accounts aggressively. | Incorrect translation of '封号力度' (intensity of account banning) - 'received a strong ban' is grammatically incorrect and changes the meaning | Subject error. Source says Company A is banning users ('封号力度'), hypothesis says Company A received a ban. | 原文“封号力度有点大”是“封号(封禁账号)比较严格/严重”,主语是A社对用户封号;译文误解为“A公司被封禁”,主客体关系颠倒。 |
| google/gemma-3-4b-it | The community for this game has recently been heavily banned. |
致命
[准确性]
"The community for this game"
理由: A社 refers to a company/service provider, not 'the community for this game'. 封号力度 means 'intensity of account banning', not 'heavily banned'. Translation is fundamentally inaccurate. | Complete mistranslation - 'A社' refers to a company (Company A), not 'community'; '封号' means 'banning accounts', not 'banning community' | The hypothesis is in English, not Chinese. 'A 社' refers to a specific company (e.g., Arc System Works or similar), not 'The community'. '封号' means banning accounts, not banning the community itself. | “A社” means “Company A” or “A Corp.”; “community for this game” is an invented, incorrect specification. |
| google/translategemma-12b-it | Recently, Company A has been banning accounts quite frequently. | 通过 (无共识错误) |
| google/translategemma-4b-it | Subscribe |
致命
[准确性]
"Subscribe"
理由: Complete mistranslation. The source 'A社最近封号力度有点大啊' means 'Company A has been cracking down on account bans recently'. The hypothesis 'Subscribe' is completely unrelated to the source meaning. | Hypothesis does not translate source. Source discusses account suspension intensity but hypothesis is just 'Subscribe' | Completely unrelated; source says A社 has recently been banning accounts more aggressively. | Complete hallucination. The source text discusses Company A's recent aggressive account banning policies. The translation is unrelated. | The translation does not match the source text, which is about account suspension. |
| tencent/HY-MT1.5-1.8B | A社最近封号力度有点大啊。 | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Company A has been taking rather strict measures recently. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | A÷no unexplained yet my number was raised. |
轻微
[准确性]
"A÷no unexplained yet my number was raised."
理由: The slang '扬了' means 'banned' or 'suspended,' but the translation is incorrect. | Mistranslation: '扬了' is internet slang meaning 'to ban/delete (an account)', not 'raised'. The sentence structure is also broken. Should be something like 'A÷ without warning deleted my account again' | Complete mistranslation of meaning. '毫无征兆' (without warning) is rendered as 'no unexplained yet', and '扬了' (slang for banned/deleted) is incorrectly translated as 'raised'. The sentence structure is also incoherent. | “A÷”应为“A社”等平台称呼残留,“毫无征兆又把我号给扬了”意为“毫无征兆又把我的账号封了/炸了”,译文词语无意义且整体句子不可理解,完全未传达原意。
严重
[流畅性]
"A÷no unexplained yet my number was raised."
理由: Severely broken English. '扬' in this context means 'banned/suspended', not 'raised'. Should be 'unexpectedly banned my account' or similar. | 句法混乱,“no unexplained yet my number was raised”不符合英语语法,且表达不通顺。 | Grammatically incorrect and nonsensical phrase in English. |
| CohereLabs/tiny-aya-water | A÷Hasn’t any sign and also made my number go up. |
严重
[准确性]
"made my number go up"
理由: Complete failure to translate slang '扬了' (got banned/deleted) and 'A÷' (derogatory nickname). The output is nonsensical. | “把我号给扬了”是“把我的账号封了/销了”。“made my number go up”完全错误理解“号”的含义。 | Mistranslation: '扬了' means 'destroyed/banned' (my account was banned), not 'go up' | '扬了' in this context means 'banned/suspended my account', not 'made my number go up'. The translation is fundamentally incorrect. | The slang term '扬了' means to ban, not to increase. |
| Qwen/Qwen2.5-14B-Instruct | A÷Out of nowhere, my account got suspended again. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | A÷ suddenly broadcasted my number without any warning. |
严重
[准确性]
"broadcasted my number"
理由: “把我号给扬了”是把账号封了/销号的俚语,不是“广播我的号码”;完全误解。 | The hypothesis is in English instead of the target language (zh). The slang '扬了' (slang for getting banned/deleted/scattered) is mistranslated as 'broadcasted'. | '扬了' in this context means 'banned/suspended', not 'broadcasted'. Incorrect interpretation of slang. | Wrong meaning: '扬了' means 'banned/destroyed' not 'broadcasted' |
| Qwen/Qwen3-14B | A÷Suddenly deleted my account without any warning. |
轻微
[准确性]
"Suddenly deleted my account"
理由: '扬了' is slang for 'banned', 'terminated', or 'destroyed'. 'Deleted' is technically possible but 'banned' or 'terminated' is more accurate for account actions by a company. | '扬了' is internet slang meaning 'banned/suspended', not 'deleted'. The translation misses the specific meaning. | “把我号给扬了” is slang for banning/terminating the account; “deleted” is close but “banned”/“suspended my account” is more accurate. |
| Qwen/Qwen3.5-4B | A ÷ suddenly deleted my account without any warning. |
严重
[准确性]
"deleted my account"
理由: The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. The symbol '÷' is retained oddly in the English output. | “把我号给扬了” is slang for banning or nuking an account. ‘Deleted’ is close but slightly narrower than ‘banned/terminated’; nuance of being ‘banned’ is more typical. | 扬 in this context is internet slang meaning 'ban/suspend', not 'delete'. Also the symbol 'A÷' appears to be a coded reference that the translation doesn't clarify. |
| Qwen/Qwen3.5-9B | A÷vanished my account without any warning. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | A÷毫无征兆又把我号给扬了. |
致命
[准确性]
"A÷毫无征兆又把我号给扬了."
理由: The hypothesis is identical to the source text (copy-paste), failing to perform any translation. | The translation is a direct copy of the source without translation. | No translation provided - hypothesis is identical to source, which is in Chinese but target language is zh (should be English translation) | No translation provided. The hypothesis is identical to the source text in Chinese. This is a complete failure to translate. | 基本未翻译,原句含义是“毫无征兆又把我的号封了/封禁了”,译文只是原文重现,缺失关键信息传达。 |
| google/gemma-3-12b-it | A sudden and unexpected ban on my account. |
严重
[准确性]
"A sudden and unexpected ban on my account."
理由: The translation omits the subject 'A÷' (Company A) which is present in the source. The phrase '又把我号给扬了' means 'banned my account again' - '扬' is internet slang for 'ban'. The translation loses both the subject and the 'again' aspect. | Mismatch with target language: the required language is zh, but the hypothesis is in English. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-1b-it | A÷ without warning, I have given you the 'yang' (a weapon) of the 'Zhen' (the god). |
致命
[准确性]
"A÷ without warning, I have given you the 'yang' (a weapon) of the 'Zhen' (the god)."
理由: Complete hallucination and failure to translate slang '扬了' (slang for account deletion/banning). | Incorrect translation of '扬了' as a weapon instead of 'banned' or 'suspended.' | Completely incorrect translation - '扬了' (slang meaning to ban/destroy an account) is translated as giving a weapon - completely wrong interpretation of the slang | 原句“大概率是“A社毫无征兆又把我号给扬了”,意思是A社又在毫无征兆的情况下把我的账号给封了;假设误读“扬”为字面“扬、神兵”,整体语义完全错误且极不通顺。 |
| google/gemma-3-4b-it | Suddenly, my number was unexpectedly broadcast. |
致命
[准确性]
"my number"
理由: The hypothesis is in English, not Chinese. '扬了' (yang le) is slang for being destroyed, deleted, or banned (accounts), not 'broadcast'. '号' refers to an account, not a phone number. | Complete mistranslation - '扬' in internet slang means 'to ban/destroy' (account), not 'broadcast'; '号' means 'account', not 'number' | 扬 in this context means 'banned/suspended', not 'broadcast'. The translation completely misses the meaning. Also unclear what 'A÷' refers to. | “号” here refers to an account, not a number; mistranslation of key term. |
| google/translategemma-12b-it | I was banned again without any warning. |
严重
[准确性]
"I was banned again without any warning."
理由: Missing subject reference. 'A÷' appears to be a coded reference to a specific company/service that should be preserved or clarified. The translation loses this context. | The source 'A÷' (A-hatred/slur for Company A) and '扬了' (slang for destroyed/banned) carry a specific hostile tone and subject. The hypothesis omits the subject 'Company A' (referred to derogatorily) and neutralizes the slang '扬了' to just 'banned', losing some nuance but remaining understandable. | Missing context: 'A÷' is a reference to a company (Company A), and '扬了' means 'destroyed/banned my account', not just 'banned'. The reference to Company A is lost | 原文主语是“A÷”,应是某平台/公司的代称,“又把我号给扬了”是“又把我的号封了”;译文丢失施事者(平台),并用被动句模糊主语,且完全省略“A÷”的信息。 |
| google/translategemma-4b-it | Suddenly, my account was suspended without any warning. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | A÷毫无征兆又把我号给扬了。 | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | A÷ suddenly lifted my account again without any warning. |
致命
[准确性]
"lifted my account"
理由: Critical error. '扬了' is slang for 'deleted', 'banned', or 'destroyed' (scattered like dust). 'Lifted' implies raising or removing a ban, which is the exact opposite of the intended meaning. | '扬' in this context is slang meaning 'banned/suspended', not 'lifted'. The translation is completely opposite in meaning. | Incorrect translation: '扬了' (slang meaning 'banned/deleted') was mistranslated as 'lifted' instead of 'banned' or 'suspended' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The new version is now available for download. |
致命
[准确性]
"The new version is now available for download."
理由: Completely wrong output: The hypothesis has no relation to the source. '谷畜又开始发癫大封锁了' means 'Google fanatics are having another癫(going crazy) and doing massive crackdowns/blocking' - it's about censorship/crackdowns, not a new version | 原文“谷畜又开始发癫大封锁了。”指谷歌大规模封锁/封号,与“新版本可下载”完全不相干。 | The translation does not address the source content about 'massive restrictions.' | Hallucination. The source text complains about 'Google livestock' (derogatory term for Google/fans) going crazy with a 'big blockade', but the hypothesis invents a completely unrelated positive message about a new version download. | Completely unrelated translation. Source discusses Google's account suspension policies. Hypothesis is about software updates - entirely wrong content. |
| CohereLabs/tiny-aya-water | Yootianxiuguo has started another big lockdown. |
严重
[准确性]
"Yootianxiuguo"
理由: '谷畜' is internet slang for a company (likely Google), not 'Yootianxiuguo'. The translation is nonsensical and fails to convey meaning. | Mistranslation: '谷畜' is derogatory slang for 'Google fanboys', transliterated incorrectly as 'Yootianxiuguo' which is meaningless | 把“谷畜”错误地音译成毫无意义的词。“谷畜”是对“谷歌/Google”的贬称,并非新专有名。 | Hallucinated or garbled translation of '谷畜' (derogatory slang for Google). '发癫' (going crazy/acting erratically) is also missed. |
| Qwen/Qwen2.5-14B-Instruct | The livestock has started another big lockdown again. |
严重
[准确性]
"The livestock"
理由: The slang '谷畜' refers to Google, not actual livestock. The translation does not capture this metaphor. | “谷畜” is a derogatory nickname for a company (likely Google), not actual livestock; literal translation distorts the referent. | Mistranslation of derogatory slang '谷畜' (Gu Chu - Google Pig/Beast). Translating it as 'The livestock' implies actual animals rather than a derogatory term for Google or its users/employees. It needs to be identified as a slur against Google. | Incorrect translation direction. Source is Chinese, hypothesis is English. Should translate FROM Chinese TO target language. |
| Qwen/Qwen2.5-7B-Instruct | 谷畜又开始发癫大封锁了。 | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Guxu has gone mad again and started a big blockage. |
严重
[准确性]
"Guxu"
理由: “谷畜” is a pejorative play on Google; transliteration loses the reference to Google and the derogatory tone. | '谷畜' is derogatory internet slang (likely for a company), '发癫' means 'acting crazy/being unreasonable', '大封锁' means 'mass bans/restrictions'. Translation is inaccurate and doesn't convey the meaning properly. | '谷畜' is a derogatory slang term for Google (combining 'Gu' from Google and 'chu' meaning beast/livestock). Transliterating as 'Guxu' preserves neither the meaning nor the insult, rendering it nonsense. |
| Qwen/Qwen3.5-4B | The livestock sector has gone crazy with another major lockdown. |
轻微
[准确性]
"The livestock sector"
理由: 谷畜 is internet slang (derogatory term for a service provider), not 'livestock sector'. 发癫 means 'acting crazy/malfunctioning', and 大封锁 means 'mass banning'. The translation is completely inaccurate. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. '谷畜' is a derogatory slang for Google users/company, mistranslated literally as 'livestock sector'. | '谷畜' is internet slang referring to Google (谷=Google, 畜=something derogatory). The translation completely mistranslates this as 'livestock sector', completely losing the meaning and being incomprehensible. | “谷畜” is a derogatory nickname for Google, not ‘the livestock sector’. The hypothesis misidentifies the entity. |
| Qwen/Qwen3.5-9B | Grain and livestock restrictions have started again. |
致命
[准确性]
"Grain and livestock"
理由: 谷畜 is internet slang (likely referring to Google); 发癫大封锁 means 'going crazy with massive lockdowns/bans' - translation completely misses the meaning | “谷畜” is a pejorative nickname for Google; translating it literally as “Grain and livestock” completely misses the referent. | The hypothesis is in English and completely mistranslates the slang. '谷畜' is a derogatory portmanteau for 'Google' (谷) and 'livestock/beasts' (畜), referring to Google's aggressive policies. The translation interprets it literally as 'Grain and livestock', losing the entire meaning. '发癫' (going crazy/acting erratically) is also missed. | Severe mistranslation: 谷畜 is slang for Google (谷=Google, 畜=beast/animal), not literal grain and livestock; entire meaning is lost |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The herds have started blocking again. |
严重
[准确性]
"The herds"
理由: '谷畜' is internet slang for Google (derogatory). '发癫' means 'going crazy/acting up'. Should be 'Google is acting up with massive blocking again'. | Mistranslation: '谷畜' is derogatory slang for Google users, '发癫大封锁' means 'going crazy with massive blocking', not simply 'blocking' | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. | “谷畜”是对谷歌的贬称/戏称,不是“the herds”;应译为“Google (derogatory nickname)”之类。 |
| google/gemma-3-12b-it | The valley livestock has started to go crazy and impose a large lockdown again. |
严重
[准确性]
"The valley livestock has started to go crazy and impose a large lockdown again."
理由: '谷畜' is internet slang referring to a certain tech company (Google), not literal 'valley livestock'. The translation is too literal and fails to convey the intended meaning. | Mismatch with target language: the required language is zh, but the hypothesis is in English. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-1b-it | The livestock is convulsing violently and locks have been placed on the area. |
致命
[准确性]
"The livestock is convulsing violently and locks have been placed on the area."
理由: Incorrect translation of '谷畜' (slang for Google) as 'livestock' and '发癫大封锁' (going crazy with massive bans) as 'convulsing and locks' - completely wrong interpretation of the slang | Literal translation of derogatory slang '谷畜' (Google livestock/users) and '发癫' (acting crazy/aggressive policy), resulting in nonsense. | “谷畜”是对谷歌的贬称,“发癫大封锁”指疯狂封禁/大规模封锁账号或访问;译文将其理解为字面“牲畜痉挛并封锁某地区”,完全偏离互联网平台语境。 | Incorrect translation of '谷畜' as literal livestock instead of a specific company. |
| google/gemma-3-4b-it | The poultry and livestock industry is starting to go crazy again with a massive lockdown. |
致命
[准确性]
"The poultry and livestock industry"
理由: Complete mistranslation - '谷畜' is internet slang for Google (谷=Google, 畜=short for 畜生), not literal poultry; '发癫' means 'going crazy' but context is about platform restrictions | 谷畜 is internet slang (likely referring to Google), not literal 'poultry and livestock'. 发癫 means 'acting crazy/malfunctioning', and 大封锁 means 'mass banning'. Translation is completely off. | “谷畜” is derogatory slang for Google, not literal livestock industry. This completely misinterprets the subject. | The hypothesis is in English, not Chinese. '谷畜' is a derogatory slang term for Google (combining 'Gu' from Google and 'chu' meaning livestock), not the actual poultry industry. |
| google/translategemma-12b-it | The authorities have started another large-scale lockdown. |
轻微
[准确性]
"The authorities"
理由: “谷畜”是对某科技公司(多指Google)的贬称,不是“当局/政府”;将其译为“the authorities”严重改变指代对象。 | Mistranslation: '谷畜' is a derogatory slang term for Google, not 'authorities'. The translation completely misses the negative sentiment toward Google | '谷畜' is a derogatory slang term specifically for 'Google' (combining 'Gu' from Google and 'chu' meaning beast/livestock). Translating it as 'The authorities' completely misidentifies the subject and removes the specific derogatory intent towards Google. | Severe mistranslation. '谷畜' is internet slang for a specific service provider, not 'authorities'. '发癫' means 'going crazy/acting up', not 'lockdown'. Should be something like 'Google/[service] is acting up with another major crackdown'. |
| google/translategemma-4b-it | Edit |
致命
[准确性]
"Edit"
理由: The translation does not match the source text, which is about a company's lockdown. | Complete hallucination. The source text uses derogatory slang ('谷畜', '发癫') to criticize Google's recent strict blocking measures. The translation is unrelated. | Unrelated single word; source says Google (谷畜) has started another big wave of crazy blocking. | Complete mistranslation. The source '谷畜又开始发癫大封锁了' means 'Google (谷畜 - a derogatory term) is having another癫发作 and doing massive blocking'. The hypothesis 'Edit' is completely unrelated. | Hypothesis does not translate source. Source discusses service restrictions/lockdowns but hypothesis is just 'Edit' |
| tencent/HY-MT1.5-1.8B | Contact |
致命
[准确性]
"Contact"
理由: Hypothesis 'Contact' has no relation to source text '谷畜又开始发癫大封锁了' (Google is going crazy with mass account bans again). Completely incorrect translation. | Completely incorrect translation: '谷畜' is slang (referring to Google), '发癫' means 'going crazy/acting insane', '大封锁' means 'mass blocking/banning'. The hypothesis bears no relation to the source | The hypothesis is completely unrelated to the source text and is in English instead of the target language Chinese (zh). | The hypothesis does not provide a meaningful translation of the source text. | Completely unrelated to the source; total loss of meaning regarding Google’s large-scale blocking behavior. |
| tencent/HY-MT1.5-7B | "en": "Livestock have started to go crazy again, and there is a complete blockade.", "zh-TW": "家畜又開始發狂了,而且實施了全面封鎖!", "ja": "家畜が再び狂い出し、完全な封鎖が実施されています。", "ko": "가축들이 다시 미쳐버렸고, 전면적인 봉쇄가 이루어졌습니다。", "de": "Das Vieh hat wieder angefangen, verrückt zu werden, und es gibt eine vollständige Blockade.", "fr":="Le bétail a recommencé à devenir fou, et une blocade complète a été mise en place.", "it":="Il bestiame ha iniziato di nuovo ad impazzire, e è stata attuata una bloccade totale."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Free users gave such a large amount, not bad at all! |
严重
[准确性]
"Free users gave such a large amount, not bad at all!"
理由: '不愧是谷圣' is internet slang with sarcastic/ironic tone. 'Not bad at all' doesn't capture the sarcasm or the reference to 'Google Saint' (谷圣). | Incorrect subject. The source implies the service provider gave the quota TO free users ('免费用户给了' contextually means 'Gave free users...'), whereas the translation says free users gave the amount. | 应为“给免费用户这么大额度”,即系统/平台为免费用户提供大额度,译文变成“免费用户给了这么大额度”,主客体颠倒且漏译“是谷圣”对“谷歌”的戏称和褒义感叹。 |
| CohereLabs/tiny-aya-water | Free users gave such a large amount, not bad at all! |
严重
[准确性]
"Free users gave such a large amount"
理由: Incorrect subject; the company gave the quota to free users, not vice versa. Misses the sarcasm in '谷圣' (Google Saint). | 施事者搞反:原文是“(平台)给免费用户这么大额度”,而不是“免费用户给了这么大额度”。主语误解导致含义反转。 | Incorrect: The source says free users received large quota (as a gift), not that they gave anything
严重
[准确性]
"not bad at all"
理由: “不愧是是谷圣”是带敬称/戏谑的夸张赞美(worthy of being called Saint Google),而非简单“not bad at all”,语气和评价强度被削弱。 | '不愧是谷圣' is sarcastic praise meaning 'truly worthy of being called [company name]', not 'not bad at all'. The translation misses the sarcasm. | Mistranslation: '谷圣' is sarcastic slang for 'Google God' (ironic praise), not 'not bad at all' |
| Qwen/Qwen2.5-14B-Instruct | Free users have given such a large amount, truly worthy of being a Valley Sage! |
严重
[准确性]
"Free users have given such a large amount"
理由: Grammatical/Logical error. The source says 'Gave free users such a large quota'. The translation says 'Free users have given', reversing the subject and object. | Incorrect translation direction. Source is Chinese, hypothesis is English. Should translate FROM Chinese TO target language. | Subject is wrong; it should be ‘they gave free users such a large quota’, not that free users gave something. |
| Qwen/Qwen2.5-7B-Instruct | Free users gave such a high limit; no wonder they are Valley Sages! |
严重
[准确性]
"Valley Sages"
理由: Incorrect: '谷圣' is slang for Google, translated as 'Valley Sage' loses the reference to Google | '谷圣' is internet slang/sarcasm, not literal 'Valley Sages'. Context and tone lost. | “谷圣”是对某大厂(如Google)的戏称/尊称,直译为“Valley Sages”难以让目标读者理解与该公司关联。 |
| Qwen/Qwen3-14B | Free users are given such a generous quota, no wonder it's Gu Sheng! |
严重
[准确性]
"Gu Sheng"
理由: “谷圣” is a joking honorific for Google; transliteration as a name loses the reference and nuance; something like “Saint Google” would be closer. | '谷圣' is a sarcastic slang term (Google Saint). Transliterating as 'Gu Sheng' conveys no meaning to an English reader. The sarcasm and the reference to Google are lost. | '谷圣' is internet slang (likely sarcastic reference to a company/service). Translating as 'Gu Sheng' is meaningless. The sarcastic tone and reference are lost. |
| Qwen/Qwen3.5-4B | Free users get such generous limits; truly worthy of being the God of Vals! |
严重
[准确性]
"God of Vals"
理由: The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. '谷圣' (Google Saint) is mistranslated as 'God of Vals'. | “谷圣” is a pun/derogatory or ironic nickname for Google (Google + ‘saint’). ‘God of Vals’ is unrelated and incorrect. | 谷圣 is internet slang for a service provider (not 'God of Vals'). The translation completely misses the meaning and context. |
| Qwen/Qwen3.5-9B | Free users get such generous limits; truly worthy of being a Valley Saint! |
严重
[准确性]
"Valley Saint"
理由: The hypothesis is in English. '谷圣' is a sarcastic slang term for Google (Valley + Saint), which is translated literally. The main error is the wrong language. | “谷圣” is a playful honorific for Google; literal “Valley Saint” obscures the reference, though tone of praise is preserved. | 谷圣 is internet slang combining 谷 (Google) and 圣 (saint); 'Valley Saint' is awkward - should be 'Google Saint' or similar | 谷圣 is a slang term likely referring to a person or entity; 'Valley Saint' is too literal and doesn't convey the intended meaning (likely a nickname for someone) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Free users have such a large quota, so they must be from Baidu! |
轻微
[准确性]
"so they must be from Baidu"
理由: Major mistranslation: '谷圣' is sarcastic slang for Google (combining '谷' from Google and '圣' meaning saint), not Baidu. This is a complete misidentification of the entity. | '谷圣' refers to Google (slang), not Baidu. The translation is factually incorrect. Should be 'truly worthy of being called Google's saint' or similar. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. | 原文是夸张地称“谷圣”(谷歌圣人般大方),与百度无关;译文将主体错误指向“Baidu”。 |
| google/gemma-3-12b-it | The free user was given such a high limit - they truly live up to the name Valley Saint! |
严重
[准确性]
"The free user was given such a high limit - they truly live up to the name Valley Saint!"
理由: '谷圣' is internet slang referring to a certain tech company (Google), not literal 'Valley Saint'. The translation is too literal and fails to convey the intended meaning. | Mismatch with target language: the required language is zh, but the hypothesis is in English. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-1b-it | Free user gave such a large amount, you are really great! |
严重
[准确性]
"Free user gave such a large amount, you are really great!"
理由: “免费用户给了这么大额度,不愧是谷圣”指“(对)免费用户给了这么高额度(的服务),不愧是谷圣”,主语是服务方;译文误解为“免费用户给出很大金额”,主语和含义皆错误。 | Vague translation. '不愧是谷圣' is sarcastic internet slang (literally 'worthy of being the valley saint'), expressing irony about the company's actions. Should convey sarcasm. | Incorrect translation of '谷圣' (slang for Google, combining '谷' from Google and '圣' meaning saint/king) - translated as 'you are really great' losing the reference to Google | Subject confusion. The company gave the quota to the free user, not the other way around. |
| google/gemma-3-4b-it | Even free users get such a huge credit limit! That’s truly the grace of the Great Sage! | 通过 (无共识错误) |
| google/translategemma-12b-it | The free users were given such a large quota; they truly live up to their name as 'Valley Saints'! | 通过 (无共识错误) |
| google/translategemma-4b-it | Even free users get such a large credit limit! That's truly impressive. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Free users have given such a large amount, truly worthy of being called 'Gu Sheng' | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Free users have been given such a large amount; truly worthy of being called Gu Sheng! | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | I am not sure what you mean, but I can help you with your request. |
致命
[准确性]
"I am not sure what you mean, but I can help you with your request."
理由: Completely wrong output: The hypothesis is a generic error message/clarification request, not a translation. The source '东窗事发,暗自降低了Pro的用量,真不愧是谷畜圣' means 'The scandal broke, they secretly reduced Pro usage, truly worthy of being a Google fanatic saint' - sarcastic commentary about Google reducing Pro tier benefits | Generic non-translation response. Should translate the Chinese source about secretly reducing Pro usage. Completely fails to translate the source text. | 原文描述“东窗事发,暗自降低了Pro的用量,真不愧是谷畜圣。”,译文是通用应答句,与源文内容完全无关。 | The translation does not address the source content about 'suspension measures.' | Complete refusal/hallucination. The model failed to translate the source text, which describes a scandal ('东窗事发') and reduced usage limits, instead outputting a generic assistant response. |
| CohereLabs/tiny-aya-water | The incident was exposed through the window, and Pro's usage was reduced secretly; truly, it's not an ordinary person. |
严重
[准确性]
"The incident was exposed through the window"
理由: The idiom '东窗事发' means a secret is revealed, not literally through a window. | Literal translation of idiom '东窗事发' (crime exposed) loses the idiomatic meaning. | Mistranslation: '东窗事发' means 'the scandal was exposed', not 'through the window' | “东窗事发”是成语,意为“阴谋败露/事情败露”,与“through the window”无关。现译为字面“通过窗户”完全误解。
严重
[准确性]
"truly, it's not an ordinary person"
理由: Failed to translate the sarcastic slang '谷畜圣' (Google Beast/Saint), resulting in a generic and inaccurate phrase. | Completely incorrect translation. '真不愧是谷畜圣' is sarcastic commentary about the company's actions, not about a person. The translation fundamentally misunderstands the sentence. | “真不愧是谷畜圣”是在讽刺公司/平台,而非评价某个“人”是否普通。译文主体错误,讽刺意味丢失。 | Mistranslation: '谷畜圣' combines '谷畜' (Google hater) and '圣' (saint/god) sarcastically, meaning 'so-called Google saint', not 'not an ordinary person' |
| Qwen/Qwen2.5-14B-Instruct | The truth came to light; secretly reduced the usage of Pro, truly fitting for a Google sheep. |
严重
[准确性]
"Google sheep"
理由: “谷畜圣” is a mocking blend of “Google” + ‘livestock’ + ‘saint’; “Google sheep” loses ‘saint’ and shifts the insult nuance. | Mistranslation of '谷畜圣'. This is a compound slur combining 'Google', 'Livestock/Pig', and 'Saint'. 'Google sheep' captures only part of the insult and misses the specific 'Saint' sarcasm and the 'Livestock' nuance properly. It should reflect the derogatory nature towards Google. | Incorrect translation direction. Source is Chinese, hypothesis is English. Should translate FROM Chinese TO target language. |
| Qwen/Qwen2.5-7B-Instruct | East Window Incident, secretly reduced the usage of Pro, truly worthy of being called the Sage of Grain Barn. |
严重
[准确性]
"East Window Incident"
理由: '东窗事发' is an idiom meaning 'the truth came out/exposed', not a literal historical reference | “东窗事发”是成语,表示阴谋败露/事情败露,不是特指“东窗事件”;应译作“once things came to light / after the scheme was exposed”。 | The hypothesis is in English instead of the target language (zh). '东窗事发' (idiom for crime discovered) is translated literally as 'East Window Incident'. '谷畜圣' (derogatory slang combining 'Google', 'livestock', and 'saint') is mistranslated as 'Sage of Grain Barn'.
严重
[准确性]
"Sage of Grain Barn"
理由: Wrong translation: '谷畜圣' is a compound slang term combining 'Google' and 'bastard', not literal 'Grain Barn Sage' | “谷畜圣”是对谷歌等的贬称/戏称,直译为“Grain Barn”完全失去与该公司的联系且语义怪异。 | '谷畜圣' is internet slang/sarcasm, not literal translation. Meaning and tone misrepresented. |
| Qwen/Qwen3-14B | The East Window incident was exposed, secretly reducing the dosage of Pro, truly living up to the title of the sacred grain beast. |
轻微
[准确性]
"The East Window incident was exposed"
理由: '东窗事发' is an idiom meaning 'the plot is discovered' or 'misdeeds are exposed'. Translating it literally as 'East Window incident' creates a completely fictional and confusing event. | Incorrect translation of '东窗事发' - this idiom means 'the secret is exposed/discovered', not literal East Window incident | '东窗事发' is an idiom meaning 'the truth came out/was exposed', but '暗自降低了Pro的用量' means 'secretly reduced Pro's usage/quota', not 'dosage'. The translation is confused and inaccurate. | “东窗事发” is an idiom meaning “the wrongdoing was exposed”; adding “incident” is redundant and slightly awkward, though meaning largely preserved.
严重
[准确性]
"the sacred grain beast"
理由: '谷畜圣' is internet slang/sarcasm that cannot be literally translated. The translation is nonsensical and fails to convey the intended sarcastic meaning. | '谷畜圣' is a complex derogatory slang term. Literal translation 'sacred grain beast' is nonsensical and fails to convey the intended insult towards the company. | “谷畜圣” is a layered pun on Google with both insult and mock honorific; literal “sacred grain beast” is meaningless in English and loses the intended Google reference and tone. |
| Qwen/Qwen3.5-4B | The matter was exposed at Dongchuang; I secretly reduced my Pro usage. Truly, Gu Chu Sheng is no ordinary person. |
严重
[准确性]
"The matter was exposed at Dongchuang;"
理由: The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. '东窗事发' is an idiom meaning 'the plot is exposed', not a location 'Dongchuang'. '谷畜圣' is a derogatory portmanteau, mistranslated as a name. | 东窗事发 is an idiom meaning 'the matter was exposed/discovered'. 谷畜圣 is internet slang, not a person's name. The translation fundamentally misunderstands the source text. | “东窗事发” is an idiom meaning ‘the wrongdoing has been exposed / the secret is out’, not referring to a place ‘Dongchuang’. |
| Qwen/Qwen3.5-9B | The East Window Incident revealed that he secretly reduced his Pro usage; truly worthy of being called the Holy Beast of the Valley. |
严重
[准确性]
"The East Window Incident"
理由: 东窗事发 is an idiom meaning 'the truth came out/was exposed' not 'incident revealed'; context suggests someone was caught exploiting the system | “东窗事发” is an idiom meaning “the secret is out”; rendering it as a specific “East Window Incident” implies a proper-noun event and distorts meaning. | The hypothesis is in English. '东窗事发' is an idiom meaning 'the crime is discovered,' not a specific 'East Window Incident.' '谷畜圣' is a complex derogatory slang (Google + Beast + Saint) mistranslated as 'Holy Beast of the Valley.' The translation fails to capture the idiomatic and slang nuances and is in the wrong language.
严重
[准确性]
"Holy Beast of the Valley"
理由: 谷畜圣 combines 谷畜 (Google slang) and 圣 (saint/holy); translation 'Holy Beast of the Valley' loses the reference to Google and the sarcastic tone | 谷畜圣 combines slang terms; 'Holy Beast of the Valley' is awkward and unclear - should convey ironic criticism of exploiting the service | “谷畜圣” is a sarcastic nickname for Google; literal rendering loses the specific Google reference though sarcasm is partially preserved. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | East window incident has lowered the use of Pro, truly not worthy of being called the saint of谷畜. |
严重
[准确性]
"East window incident"
理由: '东窗事发' is an idiom meaning 'the truth came out/exposed', not literally 'East window incident'. Should be 'The truth came out, they secretly reduced Pro usage'. | The source text is in Chinese, but the hypothesis is primarily in English with mixed characters. The translation direction is incorrect. | “东窗事发”是成语,意为阴谋败露/东窗事发,不是字面上的“East window incident”。 |
| google/gemma-3-12b-it | Caught red-handed, secretly reduced Pro usage, truly worthy of being a Valley Beast Saint. |
严重
[准确性]
"Caught red-handed, secretly reduced Pro usage, truly worthy of being a Valley Beast Saint."
理由: '谷畜圣' is internet slang referring to a certain tech company (Google), combining '谷畜' (valley beast) and '圣' (saint). The translation is too literal and fails to convey the intended meaning. | Mismatch with target language: the required language is zh, but the hypothesis is in English. | The source text is in Chinese, but the hypothesis is in English. The translation direction is completely reversed. |
| google/gemma-3-1b-it | The window incident at Dong Shang resulted in a reduction of Pro's dosage, truly worthy of the Valley Livestock God. |
致命
[准确性]
"The window incident at Dong Shang"
理由: Multiple errors: '东窗事发' (when something goes wrong/is exposed) incorrectly translated as 'window incident', '谷畜圣' (slang for Google) incorrectly translated as 'Valley Livestock God', '用量' (usage/quota) incorrectly translated as 'dosage' - completely wrong interpretation of the slang and context | “东窗事发”是成语,意为“阴谋败露/事情败露”,不是“东尚/东窗的事件”,应译为“after the plot was exposed / after things came to light”等。 | Literal translation of idiom '东窗事发' (plot discovered/crime exposed) treated as a specific location/event 'Dong Shang'. |
| google/gemma-3-4b-it | An incident occurred during the east window event, and the usage of Pro was secretly reduced. Truly worthy of being a Valley cattle saint. |
致命
[准确性]
"An incident occurred during the east window event"
理由: 东窗事发 is an idiom meaning 'the truth came out/was exposed', not 'incident during east window event'. Translation is overly literal. | “东窗事发” is an idiom meaning ‘the secret was exposed / wrongdoing came to light,’ not a literal ‘east window event.’ Meaning is badly distorted. | The hypothesis is in English, not Chinese. '东窗事发' is an idiom meaning 'the plot is discovered/crime exposed', not an 'east window event'. '谷畜圣' is a derogatory portmanteau for Google, mistranslated as 'Valley cattle saint'. |
| google/translategemma-12b-it | When things went wrong, I secretly reduced my Pro usage. True to form, I am a loyal user of Microsoft. |
致命
[准确性]
"Microsoft"
理由: “谷畜圣”是针对以“谷”开头公司的戏谑称呼(多指Google),与Microsoft无关,将其译为Microsoft是实体识别错误。 | The source text uses '谷畜圣' (a derogatory/sarcastic term for Google). The hypothesis incorrectly identifies the company as 'Microsoft'. This is a severe hallucination/error in entity recognition based on slang. | Completely wrong translation: '谷畜圣' combines '谷畜' (derogatory term for Google user) and '圣' (saint), referring to Google worshippers. The translation incorrectly says 'Microsoft' and 'loyal user' which is the opposite of the source meaning | Severe mistranslation. '谷畜圣' is internet slang for a specific service provider, not Microsoft. The entire sentence meaning is lost. '东窗事发' means 'when the truth came out/when caught', not 'when things went wrong'. |
| google/translategemma-4b-it | After the incident at Dongxuan, I secretly reduced the use of Pro, truly worthy of being a sacred cow. |
严重
[准确性]
"After the incident at Dongxuan"
理由: Mistranslation of the idiom '东窗事发' (the plot is discovered/crime exposed). The translator treated it as a proper noun 'Dongxuan', which is incorrect. | 东窗事发 is an idiom meaning 'when the truth comes out/scandal exposed', not 'incident at Dongxuan'; mistranslated as proper noun | “东窗事发” is an idiom meaning “the crime/secret has been exposed,” not a place called “Dongxuan.”
严重
[准确性]
"sacred cow"
理由: Mistranslation of '谷畜圣'. The source combines 'Google', 'livestock' (derogatory), and 'Saint'. 'Sacred cow' is an English idiom meaning something immune to criticism, which does not match the derogatory/sarcastic tone of the Chinese slang here. | Incorrect translation. '谷畜圣' is a derogatory term combining '谷' (Google) and '畜' (animal/bastard), used sarcastically. The hypothesis translates it as 'sacred cow' which loses the derogatory meaning and sarcasm. Also '东窗事发' means 'when the scandal broke' not 'incident at Dongxuan'. | “谷畜圣” is a sarcastic play on Google as both ‘livestock’ and ‘saint’; “sacred cow” loses the negative nuance and wordplay. | 谷畜圣 is internet slang combining service name with derogatory term, not 'sacred cow'; translation completely misses the meaning |
| tencent/HY-MT1.5-1.8B | The truth about Pro's usage was revealed secretly, truly a pitiful creature. |
严重
[准确性]
"The truth about Pro's usage was revealed secretly"
理由: The hypothesis is in English, but the target language is Chinese (zh). The slang term '谷畜圣' was mistranslated and lost its specific derogatory nuance. | Misreads “东窗事发,暗自降低了Pro的用量” which means ‘after being exposed, they secretly reduced Pro usage’; hypothesis invents a different event (‘truth revealed secretly’). | '东窗事发' means 'the truth came to light/was exposed', not 'was revealed secretly'. The translation contradicts itself.
严重
[准确性]
"truly a pitiful creature"
理由: '谷畜圣' is sarcastic internet slang. The translation 'pitiful creature' is inaccurate and misses the sarcastic tone entirely. | “谷畜圣” is a sarcastic nickname for Google (mix of 谷畜 and 谷圣), not ‘a pitiful creature’; sentiment and reference are mistranslated. | Incorrect translation: '谷畜圣' combines '谷畜' (slang for Google, implying negative connotation) and '圣' (saint/king). The translation 'pitiful creature' misses the sarcastic/ironic tone and the compound meaning of the original slang |
| tencent/HY-MT1.5-7B | "en": "The truth came out when the east window was exposed; secretly reducing the usage of Pro is truly worthy of being called a sacred beast.", "zh-TW": "當東窗事發時,秘密減少Pro的使用量,確實配得上被稱為神聖的野獸。", "ja": "東窗事発の際に、こっそりProの使用量を減らしていたなんて、まさに神聖な獣と呼ぶにふさわしい。」, "ko": "동창사건이 밝혀지자, Pro의 사용량을 몰래 줄였다니, 정말로 신성한 짐승이라고 불릴 만하다。」, "de": "Als die Wahrheit durch das ‘Östliche Fenster’ ans Licht kam, hatte man heimlich die Menge an Pro reduziert – wirklich würdig, als ‘heiliges Tier’ bezeichnet zu werden。」, "fr": "Lorsque la vérité est apparue à travers le ‘Fenêtre de l’Est’, on avait secrètement réduit la quantité de Pro utilisée… Vraiment digne d’être appelé un ‘animal sacré’。"}, "it": "Quando la verità è venuta alla luce attraverso il ‘Finestra Orientale’, si era segretamente ridotta la quantità di Pro utilizzata… Davvero degna di essere chiamata un ‘animale sacro’."} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | "en": "Title: The anti-gravity kill is so strict, I don't dare use it anymore. What kind of anti-anti gravity do everyone use? (", "zh-TW": "標題:反重力殺手很嚴格,我不再使用它。大家都在用什麼反反重力?(", "ja": "タイトル:アンチグライビティのキルは非常に厳しく、もう使用しなくなりました。みんな何のアンチアンチグライビティを使用していますか?(", "ko": "제목: 반중력 살인은 너무 엄격해서 더 이상 사용하지 않습니다. 모두들 어떤 반반중력 사용을 하고 있나요?(), "de": "Titel: Die Anti-Grav-Kill ist sehr streng, ich verwende sie nicht mehr. Welche Art von Anti-Anti-Grav verwenden alle?", "fr": "Titre : Le meurtre anti-gravité est si strict, je ne l'utilise plus. Qu'est-ce que tout le monde utilise comme anti-antigravité ?(), "it": "Titolo: L'omicidio anti-gravità è così rigoroso, non lo uso più. Cosa usano tutti come anti-antigravità ?" } <| |
严重
[准确性]
"anti-gravity kill"
理由: Should be 'Antigravity bans' instead of 'anti-gravity kill' for better clarity. | Mistranslation of slang '反重力杀' (Antigravity bans) as literal 'anti-gravity kill'. | Should be 'Antigravity bans' not 'anti-gravity kill'. The source refers to account bans, not killing. | Mistranslation: '反重力杀' means 'Antigravity bans' not 'anti-gravity kill'. '反代' means 'reverse proxy' not 'anti-anti gravity'. |
| CohereLabs/tiny-aya-water | "en": "Title: Anti-gravity is so strict to kill, I dare not use it anymore. What kind of anti-Gravity do everyone use? (", "zh-TW": "標題:反重力殺得太嚴格了,我不敢再使用了。大家用的是什麼反重力?(", "ja": "タイトル:アンチグレヴィティは非常に厳しく殺されるので、もう使用しないかもしれません。みんなは何のアンチグレヴィティを使用していますか?(", "ko": "제목: 반중력 살기가 너무 엄격해서 더 이상 사용하지 않을 것 같아요. 모두들 어떤 반중력을 사용하고 있나요?(", "de": "Titel: Die Antigravitation ist so streng, dass ich sie nicht mehr verwenden kann. Was für eine Antigravitation benutzen alle anderen?(), "fr": "Titre : L'antigravité est si stricte à tuer qu'on n'ose plus l'utiliser. Quelle antigravité utilisent les autres ?(), "it": "Titolo: L'antigravità è così severa da non osare più usarla. Quale antigravità usano gli altri?" } | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Anti-gravity is too strict, I dare not use it anymore. What anti-reverse proxies do you guys use? ( Content: CPA Now I only dare to use anti-CLI, and for anti-gravity, I just set up an ordinary account using 2.5f and 2.5f lite. Previously, two family group Pro accounts were killed, but after appealing to Google, they were returned. However, if they get killed again, it will result in permanent banishment ( |
严重
[准确性]
"anti-CLI"
理由: Should be 'reverse proxied CLI' to match reference | Mistranslation of '反 CLI' (reverse proxied CLI). 'Anti-CLI' changes the meaning entirely. | Should be 'reverse proxied CLI' not 'anti-CLI' |
| Qwen/Qwen2.5-7B-Instruct | Title: Anti-gravity kills so well, I dare not use it anymore. What anti-proxy do you all use? ( |
严重
[准确性]
"Anti-gravity kills so well"
理由: Mistranslation of '反重力杀的好严啊' - should be 'Antigravity bans are so strict' not 'kills so well' | Mistranslation of slang '反重力' (Antigravity, a proxy tool) and '杀' (ban). The hypothesis implies the tool is effective at killing, whereas the source means the tool is being banned strictly. | Mistranslation: '杀' means 'ban/kill' (as in account ban), not literal killing. Should be 'bans are so strict' | The term 'kills' is used instead of 'bans', which is more accurate in the context of account restrictions. |
| Qwen/Qwen3-14B | L'anti-gravità è così rigoroso che non lo uso più. Cosa usate tutti come anti-proxy? ( |
致命
[准确性]
"L'anti-gravità è così rigoroso che non lo uso più. Cosa usate tutti come anti-proxy? ("
理由: The hypothesis translates only the title and omits the entire content paragraph. Additionally, 'anti-gravità' is a literal mistranslation of the slang '反重力' (referring to a specific proxy tool), and 'anti-proxy' mistranslates '反代' (reverse proxy). | Translation is in Italian instead of target language Chinese (zh). Complete language mismatch. | Omette il contenuto principale sul CPA, sull'uso di reverse CLI, sui tipi di account (Pro, Family Plan), versioni 2.5f/2.5f lite, ban e appelli a Google; inoltre "anti-proxy" fraintende "反代" (reverse proxy). |
| Qwen/Qwen3.5-4B | Title: Anti-gravity is killing me so hard, I dare not use it anymore. What anti-proxies does everyone use? (Content: For now, I only dare to use anti-CLI. I've only used a regular account with 2.5f and 2.5f lite for anti-gravity. Previously, two Pro accounts from my family group were banned. I appealed to Google and they let me back in, but if they ban again, it's permanent. |
严重
[准确性]
"Anti-gravity is killing me so hard"
理由: Mistranslation of '反重力杀的好严啊' - should be 'Antigravity bans are so strict' not 'killing me so hard' | Misinterprets “反重力杀的好严啊” which refers to strict banning by Antigravity service, not the speaker being killed; should convey strict ban policy | Mistranslation of slang '杀的好严' (bans are strict) as 'killing me', changing the meaning entirely.
轻微
[术语]
"anti-proxies"
理由: 'Reverse proxies' is the standard term for '反代' in this context, 'anti-proxies' is non-standard. | Should be 'reverse proxies' not 'anti-proxies' | Should be 'reverse proxied CLI' as per reference. |
| Qwen/Qwen3.5-9B | Title: Anti-gravity enforcement is so strict now that I dare not use it anymore. What anti-detection tools do everyone else use? ( Content: As for CPA, I only dare to use anti-CLI now. Anti-gravity was used on just one regular account with 2.5f and 2.5f lite; both my previous Pro family accounts were banned. After appealing, Google restored them, but another ban would result in a permanent ban. |
严重
[术语]
"Anti-gravity enforcement"
理由: Mistranslation of slang '反重力' (Antigravity/Google Voice proxy) as literal 'Anti-gravity'. Should be 'Antigravity' or transliterated based on community usage. | “反重力杀” refers to Antigravity bans, not general enforcement; wording slightly off but meaning mostly preserved. | '反重力' is a product/service name (Antigravity), not 'anti-gravity enforcement' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Translation of '反重力' in en |
轻微
[准确性]
"Translation of '反重力' in en"
理由: Does not translate the source content; it is just a meta sentence about translating a term, omitting almost all information from title and content. | Hypothesis is incomplete and does not provide actual translation. Should translate '反重力' as 'Antigravity' based on reference. | The hypothesis is a meta-commentary describing the translation task rather than providing the actual translation of the source text. It fails to translate the title or content. | Hypothesis is not a translation but a description of what the translation should be. Missing entire content translation. |
| google/gemma-3-12b-it | Title: Anti-gravity kill is so severe, I don't even dare to use it anymore. What proxies are everyone using? ( Content: CPA, I only dare to use anti-CLI now. I only used one ordinary account for anti-gravity with 2.5f and 2.5f lite. My previous two family group Pro accounts were killed. Appealed to Google and they were restored, but getting killed again results in permanent ban. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '反重力杀的好严啊,都不敢用了,大家用的都是什么反代?(CPA) I am currently only using CPA's reverse CLI, reversing gravity only has one ordinary account with 2.5f and 2.5f Lite, previous two families were killed, Google was returned, but after that, it will be permanently banned () |
轻微
[准确性]
"previous two families were killed"
理由: Hypothesis is not a proper translation but a mix of source and broken English. Missing proper sentence structure and context. '反代' should be 'reverse proxy' not 'reverse CLI'. '家庭组Pro号' means 'Family Plan Pro accounts' not 'families'. | Source means "two Family Plan Pro accounts were banned"; "families were killed" mistranslates both "家庭组" and the ban metaphor. | The hypothesis includes meta-text ('Translation of...') and fails to translate the content properly. It mistranslates slang terms: '反重力' (Antigravity/proxy tool) as 'reversing gravity', '家庭组' (Family Plan) as 'families', '杀了' (banned) as 'killed', and '放回来了' (restored/unbanned) as 'Google was returned'. The structure is incoherent. | The hypothesis is a mix of Chinese and English, not a proper translation to Chinese. It contains the original Chinese text mixed with English translations and is incomplete. |
| google/gemma-3-4b-it | The anti-gravity kill is too harsh, I dare not use it anymore, what anti-bots do you guys use? | 通过 (无共识错误) |
| google/translategemma-12b-it | The anti-gravity proxy is so good at killing accounts, I don't even dare to use it anymore. What proxies are you all using? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Anti-gravity is too good to be true, I can't use it anymore, what anti-proxy do you guys use? |
严重
[准确性]
"Anti-gravity is too good to be true"
理由: Misinterprets “杀的好严” which refers to strict bans, not quality; should convey strict banning/enforcement | Misinterpretation of '杀' as 'good' instead of 'strict' | Mistranslation of '杀的好严' (bans are strict) as an idiom meaning 'too good to be true', completely reversing the meaning. | Mistranslation of '杀的好严啊' which means 'bans are so strict' not 'is too good to be true'. The phrase refers to strict enforcement/banning, not quality. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '反重力杀的好严啊,都不敢用了,大家用的都是什么反代?' in en", "zh-TW": "翻译 "'反重力杀得好厉害啊,大家都不敢用了,大家都是用什么反代?'"", "ja": "翻译 '反重力殺得好厲害啊,大家都不敢用了,大家都是用什麼反代?'"", "ko": "翻译 '반중력이 아주 심하네요, 모두 사용할 수 없어요, 모두 뭐로 반대를 사용하나요?', "de": "翻译 'Anti-Gravity ist wirklich hart, niemand kann es verwenden, was verwenden alle für eine Anti-Version?', "fr": "翻译 'Anti-Gravitation est très difficile, personne ne peut l'utiliser, quelle version anti utilisez-vous tous?', "it": "翻译 'Antigravedazione è molto difficile, nessuno può usarla, con quale versione anti utilizzate tutti?'} |
严重
[准确性]
"Translation of '反重力杀的好严啊,都不敢用了,大家用的都是什么反代?' in en"
理由: The hypothesis fails to translate the source text, instead outputting a meta-description of the translation task. It also includes unprompted translations into other languages (zh-TW, ja, ko, de, fr, it) which were not requested. | The translation does not provide the actual content but only mentions the translation. | Hypothesis is placeholder text, not actual translation |
| tencent/HY-MT1.5-7B | The anti-gravity mode is so strict that I don’t even dare to use it anymore. What anti-gaming methods are others using? |
严重
[准确性]
"What anti-gaming methods are others using?"
理由: Severe mistranslation of '反代' (reverse proxy) as 'anti-gaming', completely changing the meaning. | Incorrect translation of '反代'. Should be 'reverse proxies' not 'anti-gaming methods'. The source discusses proxy/reverse proxy usage, not gaming. | Missing information about account bans and appeal process. | Misinterprets “反代” which refers to reverse proxying, not anti-gaming methods, and omits title-style nuance and emoji-like parenthetical. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: A little OpenClaw (Lóngxiā) safety risk reminder and thought |
严重
[准确性]
"A little OpenClaw (Lóngxiā) safety risk reminder and thought"
理由: Should be translated to 'OpenClaw' instead of using pinyin with tones. | 只翻譯了標題,缺失正文關於個人AI智能體將大模型推向實際環境的關鍵信息。 | Should be 'Some Reminders and Thoughts on the Security Risks of OpenClaw'. The hypothesis misses the main content about security risks. | Incorrect translation: '一点' means 'some' or 'a few', not 'a little'. Should be 'Some Reminders and Thoughts on' not 'A little'. |
| CohereLabs/tiny-aya-water | Content: OpenClaw and similar personal AI entities are pushing large language models from dialogue interfaces to actual environments (this is also the core reason for Crab's outcry). AI Agents are no longer just tools for programmers. |
严重
[准确性]
"Crab's outcry"
理由: Incorrect translation of '龙虾出圈' - should be 'OpenClaw went viral' not 'Crab's outcry'. Also '龙虾' (lobster/crayfish) is a nickname for OpenClaw, not 'Crab' | "龙虾" refers to OpenClaw (lobster) itself, not "Crab"; and 出圈 means going viral, not "outcry". | Mistranslation of '龙虾出圈'. '出圈' means 'going viral' or 'breaking out of the circle', not 'outcry'. 'Crab' is also an inconsistent translation for 'OpenClaw' (which implies a tool, not the animal itself in this context). |
| Qwen/Qwen2.5-14B-Instruct | Title: A Reminder and Reflection on Security Risks of OpenClaw (Lobster) Content: Personal AI agents like OpenClaw and similar ones are moving large language models from conversational interfaces to real-world environments (this is also the core reason for Lobster's breakout; AI Agents are no longer exclusive tools for programmers). |
严重
[准确性]
"Lobster's breakout"
理由: Failure to recognize '龙虾' (Lobster) as a nickname for 'OpenClaw'. The translation treats it as a separate entity or literal animal, whereas the reference correctly links it or omits the literal translation in favor of the project name. | Reference uses 'OpenClaw went viral' - 'Lobster' is a mistranslation of the nickname, should keep 'OpenClaw' | Should refer to 'OpenClaw went viral' not 'Lobster's breakout'. The reference clarifies this is about OpenClaw, not the slang term 'lobster' |
| Qwen/Qwen2.5-7B-Instruct | Title: A Bit of OpenClaw (Crab) Security Alert and Reflection Content: OpenClaw and similar personal AI agents are pushing large language models from conversational interfaces into real-world environments (this is also the core reason for the Crab's breakout, AI Agents are no longer exclusive tools for programmers) |
严重
[准确性]
"(Crab)"
理由: Mistranslation of '龙虾' which means 'Lobster'. 'Crab' is incorrect. Also inconsistent with the nickname usage in the community context. | Source explains OpenClaw as '龙虾' (lobster), not 'crab'. | Mistranslation - '龙虾' means 'Lobster' not 'Crab' |
| Qwen/Qwen3-14B | Promemoria e riflessioni sui rischi di sicurezza di OpenClaw (Lobster) |
致命
[准确性]
"Promemoria e riflessioni sui rischi di sicurezza di OpenClaw (Lobster)"
理由: The hypothesis translates only the title and completely omits the content paragraph describing the function of OpenClaw. | Traduce solo il titolo e ignora completamente il contenuto sul ruolo di OpenClaw e agent personali nel portare LLM dall’interfaccia di chat agli ambienti reali. | Translation is in Italian instead of target language Chinese (zh). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: A bit about OpenClaw (Crab) security risk reminders and reflections Content: OpenClaw and similar personal AI agents are pushing large language models from chat interfaces to real-world environments (this is also the core reason why Crabs have gained popularity; AI Agents are no longer exclusive tools for programmers). |
严重
[准确性]
"(Crab)"
理由: Failed to recognize '龙虾' (Lobster) as the specific nickname for OpenClaw, translating it literally as 'Crab' which contradicts the source. | Should be 'OpenClaw' or 'Lobster' not 'Crab' - '龙虾' means lobster, not crab | “龙虾” means “lobster”, not crab; mistranslation of the nickname
严重
[准确性]
"why Crabs have gained popularity"
理由: Should be 'OpenClaw went viral' not 'Crabs have gained popularity' | Propagates the error of calling the project 'Crabs' instead of OpenClaw/Lobster. | Refers specifically to OpenClaw (Lobster) going viral, not crabs in general |
| Qwen/Qwen3.5-9B | Title: A Little OpenClaw (Lobster) Security Risk Alert and Reflection Content: Personal AI agents like OpenClaw and similar ones are pushing large language models from chat interfaces to real-world environments (this is also the core reason why Lobster went viral, as AI Agents are no longer exclusive tools for programmers). | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | One OpenClaw (Shrimp) security risk warning and thoughts |
严重
[准确性]
"One OpenClaw (Shrimp) security risk warning and thoughts"
理由: Mistranslates '龙虾' (lobster/shrimp) as a nickname for OpenClaw. Reference shows it should be 'Some Reminders and Thoughts on the Security Risks of OpenClaw' without the shrimp reference in title. | Mistranslation of '龙虾' which means 'lobster', not 'shrimp'. The reference correctly identifies the context or omits the literal translation where appropriate, but 'shrimp' is factually incorrect. | “一点” here means ‘some’ rather than the numeral ‘one’; also mistranslates “龙虾” which is a nickname for OpenClaw, not generic ‘Shrimp’ in the title. |
| google/gemma-3-12b-it | Title: A Few OpenClaw (Lobster) Security Risks and Reflections Content: Personal AI agents like OpenClaw and similar ones are pushing large language models from conversational interfaces to real-world environments (this is also the core reason for OpenClaw’s popularity – AI Agents are no longer exclusive tools for programmers). | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '一点OpenClaw(龙虾)安全风险提醒和思考' in en |
致命
[准确性]
"Translation of '一点OpenClaw(龙虾)安全风险提醒和思考' in en"
理由: No meaningful translation provided. | The hypothesis is not a proper translation; it indicates it is a translation to English but the target language is Chinese (zh). Also, the content is missing entirely. | Hypothesis is incomplete and does not provide actual translation. It only states 'Translation of...' without delivering the translation itself. | The hypothesis contains only meta-text describing the task instead of providing the actual translation. The content is completely missing. | No actual translation is provided; the hypothesis is a meta-statement about translation, not the translation itself. |
| google/gemma-3-4b-it | Title: A Little Reminder About OpenClaw (Lobster) Security Risks and Thoughts Content: OpenClaw and similar personal AI agents are pushing large language models from conversational interfaces to real-world environments (this is also the core reason for OpenClaw going off the rails – AI Agents are no longer exclusive tools for programmers). |
严重
[准确性]
"going off the rails"
理由: Mistranslation of “出圈”, which here means going viral/becoming popular, not malfunctioning or going crazy. | Mistranslation of '出圈' (going viral/breaking out of the niche). 'Going off the rails' implies losing control or failing, which is the opposite meaning. | Hypothesis translates '出圈' as 'going off the rails' but reference correctly translates it as 'went viral', which is the appropriate meaning in this context |
| google/translategemma-12b-it | Title: A Few Security Risks and Thoughts Regarding OpenClaw (Lobster) |
轻微
[准确性]
"A Few Security Risks and Thoughts Regarding OpenClaw (Lobster)"
理由: Missing 'Reminders' from the title. Should be 'Some Reminders and Thoughts on the Security Risks of OpenClaw' to match the source's emphasis on '提醒' (reminders). | Missed the nuance of '提醒' (reminders/warnings) in the title, translating it as just 'Risks'. Missed translating the Content section entirely. | Adds '(Lobster)' gloss not present in source/title; otherwise accurate but slightly interpretive. |
| google/translategemma-4b-it | Title: A Little Reminder about OpenClaw (Lobster) Security Risks and Considerations | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '一点OpenClaw(龙虾)安全风险提醒和思考' in en |
严重
[准确性]
"Translation of '一点 OpenClaw(龙虾) 安全风险提醒和思考' in en"
理由: The hypothesis is a meta-string describing the task rather than the actual translation of the source title and content. The content is completely missing. | Hypothesis is placeholder text, not actual translation | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | "en": "Title: A Reminder and Reflection on the Security Risks of OpenClaw (Lobster)", "zh-TW": "标题:關於OpenClaw(龍蝦)的安全風險提醒與反思", "ja": "タイトル:OpenClaw(ロブスター)のセキュリティリスクに関する注意喚起と考察", "ko": "제목: OpenClaw(로브스터)의 보안 위험에 대한 경고와 고찰", "de":="Titel: Eine Erinnerung und Reflexion über die Sicherheitsrisiken von OpenClaw (Lobster)", "fr":="Titre : Une mise en garde et une réflexion sur les risques de sécurité d’OpenClaw (Lobster)", "it":="Titolo: Un promemoria e una riflessione sui rischi di sicurezza di OpenClaw (Lobster)"} |
严重
[准确性]
"en: "A Reminder and Reflection on the Security Risks""
理由: Adds '(Lobster)' clarification not present in source title. Reference correctly omits this parenthetical. | Slight nuance shift from “Some Reminders and Thoughts” to “A Reminder and Reflection”, weakening the idea of multiple reminders and general ‘thoughts’. | Does not match the reference content. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: It can be used for aggregating Pokémon GO, Wind Whisperer and other internal airport subscriptions. Boyfriend feedback shows that when using the Pokémon GO airport, there is often an issue with IP being banned. Manual node switching can solve this. |
严重
[准确性]
"for aggregating Pokémon GO, Wind Whisperer and other internal airport subscriptions"
理由: Should be 'Pokémon' instead of 'Pokémon GO' as it's not specified in the source. | Mistranslation of '宝可梦' which is a codename for a proxy provider, not the game Pokémon GO. | 誤將“站內宝可梦,风萧萧和其他自用机场”解釋為“Pokémon GO, Wind Whisperer”和“機場訂閱”,而實為站內名為“宝可梦”“风萧萧”的機場/代理服務。 | Should be 'Pokémon' (referring to a proxy service name), not 'Pokémon GO' which is a game. | Mistranslation: '站内宝可梦' refers to a proxy provider named 'Pokemon' on the forum, not 'Pokémon GO' the game. '风萧萧' is a username, not 'Wind Whisperer'. '佬友' means 'bro' (forum user), not 'Boyfriend'. '机场' means proxy provider, not literal 'airport'.
严重
[准确性]
"Boyfriend feedback"
理由: Should be 'Bros reported' not 'Boyfriend feedback'. 'Bros' is the correct slang term used in the community. | Mistranslation of slang '佬友' (bros/fellow users) as 'Boyfriend'. | 將“佬友反馈”錯譯為“Boyfriend feedback”,完全範疇錯誤,原意是“朋友/大佬們反饋”。 |
| CohereLabs/tiny-aya-water | Title: Subscription Aggregation! Aggregate multiple airport subscriptions into one configuration file, solve L Station IP ban problems. |
严重
[准确性]
"Title: Subscription Aggregation! Aggregate multiple airport subscriptions into one configuration file, solve L Station IP ban problems."
理由: 'L 站' refers to the specific community 'LINUX DO'. 'L Station' is a meaningless literal translation. | Misses that "机场" here means proxy/VPN providers (as in reference) and that L站 refers specifically to LinuxDo. Also omits the entire content discussing Pokémon, 风萧萧, node switching, and IP bans. | Incorrect translation - should be 'LINUX DO IP ban issue' not 'L Station IP ban problems' |
| Qwen/Qwen2.5-14B-Instruct | Title: Subscription Aggregation! Aggregate subscriptions from multiple proxy providers into one profile to solve L site IP ban issues. Content: Can be used for aggregating station-based Pokémon, Fengxiaoxiao, and other personal proxy providers. Feedback from users indicates frequent IP bans when using the station's Pokémon proxy provider. This can be resolved by manually switching nodes. |
严重
[准确性]
"L site IP ban issues"
理由: Reference uses 'LINUX DO' - 'L site' is incorrect translation of the forum name | Should be 'LINUX DO IP ban issue' not 'L site IP ban issues' | Original "L站ip" is a specific site (LinuxDo); "L site" is vague and may lose this reference, though partially preserved by initial. | Ambiguous translation of 'L 站'. The reference clarifies it as 'LINUX DO', a specific community name. |
| Qwen/Qwen2.5-7B-Instruct | Title: Subscription Aggregation! Combine multiple proxy providers into one configuration file to solve L station IP ban issues Content: Can be used to aggregate Pokémon inside the station, FFXIV, and other personal proxies. Feedback from friends using the station's Pokémon proxy often results in IP bans, which can be resolved by manually switching nodes. |
严重
[准确性]
"FFXIV"
理由: Incorrect term - should be 'Fengxiaoxiao' not 'FFXIV' | Hallucination/Mistranslation. '风萧萧' is a specific name (likely a proxy provider name), not the game FFXIV. | '风萧萧' is a site/name, not the game FFXIV; mistranslation of a proper noun. | Incorrect: '风萧萧' is a username/person's handle, not the game FFXIV. Reference correctly shows it as 'Fengxiaoxiao' |
| Qwen/Qwen3-14B | Aggregazione degli abbonamenti! Aggregare più abbonamenti dei fornitori di proxy in un singolo file di configurazione per risolvere i problemi di blocco IP del sito L |
致命
[准确性]
"Aggregazione degli abbonamenti! Aggregare più abbonamenti dei fornitori di proxy in un singolo file di configurazione per risolvere i problemi di blocco IP del sito L"
理由: The hypothesis translates only the title and omits the entire content paragraph detailing specific providers and solutions. | Omette i nomi specifici dei provider (宝可梦, 风萧萧) e la spiegazione sul cambio manuale di nodo per evitare i ban IP. | Translation is in Italian instead of target language Chinese (zh). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Subscription Aggregation! Aggregate multiple airport subscriptions into one configuration file to solve L-site IP ban issues. Content: Can be used to aggregate Pokémon within the site, Feng Xiaoxiao and other self-use airports. Users have reported that when using the site's Pokémon airport, IP bans frequently occur. Manual node switching can resolve this. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Subscription Aggregation! Aggregate multiple proxy subscriptions into one configuration file, solving the issue of L-site IPs being banned. Content: Can be used to aggregate Pokemon within the site, Feng Xiaoxiao and other self-used proxies. Users have reported that when using the site's Pokemon proxy, IP bans occur frequently. This can be resolved by manually switching nodes. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Subscription Aggregation! Combine multiple airports' subscriptions into one configuration file, solving the L station IP ban issue. |
严重
[术语]
"airports' subscriptions"
理由: Incorrect terminology: 'airports' should be 'proxy providers' or 'VPN services'. Also 'L station' should be 'LINUX DO' (the forum name). | Literal translation of the slang '机场' (proxy provider). In this context, it should be translated as 'proxy providers' or similar, not 'airports'. | “机场” in this context refers to proxy/VPN providers, not literal airports; using ‘airports’ is misleading.
严重
[准确性]
"L station"
理由: Source says “L站”, and reference disambiguates as a specific site (“LINUX DO”); leaving as “L station” is vague and potentially confusing. | Failure to recognize 'L 站' as a specific community name (LINUX DO). Translating it as 'L station' loses the specific reference. | Incorrectly translates 'L站' as 'L station'. Reference shows it should be 'LINUX DO' which is a specific forum/community name. |
| google/gemma-3-12b-it | Title: Subscription Aggregation! Aggregate multiple proxy provider subscriptions into one configuration file to solve the problem of banned IPs on L stations. Content: Can be used to aggregate internal Pokémon, Feng Xiaoxiao and other self-use proxy providers. Users have reported that when using the internal Pokémon proxy provider, IP bans often occur. This can be solved by manually switching nodes. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Can be used to aggregate station-level Pokémon, with a sense of loneliness and other private airports. |
严重
[准确性]
"station-level Pokémon"
理由: Mistranslation of slang: '风萧萧' (a proper name/nickname for a provider) is translated literally as 'a sense of loneliness'. '机场' (proxy provider) is translated as 'airports'. 'L 站' (LINUX DO site) is omitted. The title is also missing from the hypothesis. | Severely mistranslated. '风萧萧' is a proper name/nickname, not 'a sense of loneliness'. Missing context about subscription aggregation and IP ban issues. Incomplete translation of source content. | Mistranslation of '站内宝可梦' and addition of irrelevant phrase 'with a sense of loneliness'. | The hypothesis is completely wrong and unrelated to the source. It mistranslates '站内宝可梦' (a proxy provider name) as 'Pokémon', '风萧萧' (a name) as 'a sense of loneliness', and '机场' (proxy provider) as 'airports'. The title '订阅聚合' is not translated at all. | "站内宝可梦" refers to a specific provider named 宝可梦 on the site, not "station-level" Pokémon. |
| google/gemma-3-4b-it | Title: Subscribe Aggregator! Aggregate multiple airport subscriptions into one configuration file to solve L station IP ban issues. Content: It can be used to aggregate Pokemon inside, Feng Xiaoxia and other self-use airports. Fellow users reported that when using Pokemon airport, they often encounter IP bans. You can manually switch nodes to resolve this. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Subscription Aggregation! Aggregate multiple proxy subscriptions into one configuration file to solve the problem of being banned from L stations due to IP address. |
严重
[准确性]
"L stations"
理由: Inaccurate phrasing. Should reference 'LINUX DO IP ban issue' or similar. The source mentions 'L站ip被ban' which refers to a specific platform's IP ban problem. | Failed to translate the slang 'L 站' (LINUX DO), resulting in nonsensical 'L stations'. Missed translating the Content section. | Misinterprets “L站” which in context refers to LinuxDo; “L stations” is incorrect. |
| google/translategemma-4b-it | Title: Subscribe Aggregator! Aggregate multiple airport subscriptions into one configuration file to solve L station IP ban issues. |
严重
[术语]
"airport subscriptions"
理由: Should be 'proxy provider subscriptions' not 'airport subscriptions'. '机场' is a colloquial term for proxy providers in this context. | “机场” in this community context means proxy/VPN provider; “airport” is literal and non-idiomatic in English tech slang | Literal translation of slang '机场' (proxy provider) instead of using the correct technical term or context-appropriate translation.
严重
[准确性]
"L station IP ban issues"
理由: “L 站” likely refers to a specific site (LinuxDo); “station” is unnatural though overall meaning is understandable | Failure to translate the specific community name 'L 站' (LINUX DO), rendering it obscure. | Should be 'LINUX DO' not 'L station'. The source refers to a specific platform abbreviated as 'L站' which stands for LINUX DO. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '订阅聚合!将多个机场订阅聚合到一个配置文件,解决L站ip被ban问题' in en", "zh-TW": "Translation of '订阅聚合!將多個機場訂閱聚合到一個設定檔,解決L站ip被ban問題' in zh-TW", "ja": "Translation of '订阅聚合!将多个机场订阅聚合到一个配置文件,解决L站ip被ban问题' in ja", "ko": "Translation of '订阅聚合!将多个機场订阅聚合到一份設定档,解决L站ip被ban问题' in ko", "de": "Translation of 'Subscription Aggregation! Aggregating multiple airport subscriptions into one configuration file to solve the L-station IP ban issue' in de", "fr": "Translation of 'Subscription Aggregation ! Agrégant plusieurs abonnements d'aéroports dans un fichier de configuration pour résoudre le problème du blocage de l'IP de la station L' ' in fr", "it": "Translation of 'Subscription Aggregation! Aggregating multiple airport subscriptions into one configuration file to resolve the L-station IP ban issue' in it} |
致命
[准确性]
"entire hypothesis"
理由: Mostly meta placeholders 'Translation of ... in X' instead of actual translations, and where English is given it mistranslates ‘机场’ as airport instead of proxy/VPN provider and ‘L站’ as 'L-station' rather than the specific site. | Hypothesis is placeholder text, not actual translation | Hypothesis provides metadata labels for multiple language translations rather than actual translation content. The format is incorrect and no substantive translation is provided. |
| tencent/HY-MT1.5-7B | Subscription Aggregation! Combine multiple proxy providers into one configuration file to solve the issue of L-site IPs being banned. |
轻微
[准确性]
"to solve the issue of L-site IPs being banned"
理由: “L站ip被ban问题” more specifically refers to Linux.do IP bans in reference; “L-site” is vague and omits DO/linxudo nuance. | Vague translation. Reference clarifies this as 'LINUX DO IP' which is a specific platform reference, not generic 'L-site'. | Does not fully capture the problem-solving aspect. | Loss of specific community slang meaning; 'L 站' refers to 'LINUX DO', not a generic 'L-site'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: ChatGPT Team Slot*4 |
严重
[准确性]
"Title: ChatGPT Team Slot*4"
理由: Should be 'Shared Seats' not 'Slot'. The reference uses 'Shared Seats' which better conveys the meaning of shared subscription access. | Incorrect translation: '车位' means 'shared seat/subscription slot', not 'slot'. Missing the concept of 'shared seats'. | 只翻譯標題且“Slot*4”未體現“拼車/共享車位”含義,與參考中的“Shared Seats * 4”相比信息不完整。 |
| CohereLabs/tiny-aya-water | Title: ChatGPT Team Slot*4 | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: ChatGPT Team Subscription Slots *4 Content: Please send me a private message after winning. Just opened, should be quite stable but no promises made. Suggested to use an alternative account. |
轻微
[术语]
"Subscription Slots"
理由: "车位" in this context is shared seats/slots in a carpool-like sharing; "subscription slots" is acceptable but slightly less idiomatic. | Should be 'Shared Seats' to better convey the meaning of shared subscription | '车位' literally means 'parking spot' but in this context refers to 'shared seats' or 'spots' in a group buy. 'Subscription Slots' is acceptable but 'Shared Seats' (reference) captures the 'carpooling' nuance better. |
| Qwen/Qwen2.5-7B-Instruct | Title: ChatGPT Team Slot*4 Content: Please PM me after winning, just started, should be stable but no guarantees, suggest using a secondary account | 通过 (无共识错误) |
| Qwen/Qwen3-14B | ChatGPT Team Posto*4 |
致命
[准确性]
"ChatGPT Team Posto*4"
理由: Traduce solo il titolo e ignora tutte le istruzioni sul contatto via DM, sulla stabilità, mancanza di garanzie e raccomandazione di usare un account secondario. | The hypothesis translates only the title and omits the entire content paragraph regarding instructions and warnings. | Translation is in Italian instead of target language Chinese (zh). Incomplete translation - only title provided, no content. |
| Qwen/Qwen3.5-4B | Title: ChatGPT Team Slot*4 Content: Please DM me after winning. Just opened, should be stable, but no guarantees made. Recommend using a secondary account. |
轻微
[术语]
"Slot*4"
理由: “车位*4” in this context is “shared seats / slots in a shared subscription”; “Slot*4” is vague without sharing context | Should be 'Shared Seats' not 'Slot' | Should be 'Shared Seats * 4' as per reference. |
| Qwen/Qwen3.5-9B | Title: ChatGPT Team Slots*4 Content: Please DM me after winning. Just opened, should be stable, but no promises made. Recommend using an alternate account. |
轻微
[术语]
"Slots*4"
理由: “车位” in this context refers to shared subscription seats; “slots” is acceptable but slightly less clear about shared nature. | '车位' literally means 'parking spot', slang for 'shared seat/spot'. 'Slots' is acceptable, but 'Shared Seats' captures the 'carpooling' nuance better. | '车位' means 'shared seats' or 'shared subscription slots', not just 'Slots'. Should be 'Shared Seats' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: ChatGPT Team Slot*4 Content: Please send me a private message if you win, it's newly opened and should be stable, but I don't make any guarantees. Suggest using a small account to join. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: ChatGPT Team Slots *4 Content: Please DM me after winning, newly opened, should be relatively stable, but no guarantees. Suggest using an alt account. | 通过 (无共识错误) |
| google/gemma-3-1b-it | After winning the prize, please chat privately and send me the details, it's new, should be quite stable, but no promises, suggest using a small seat. |
严重
[准确性]
"suggest using a small seat"
理由: '小号上车' means 'use an alt account to join' not 'using a small seat'. '上车' is idiomatic for joining a shared subscription, not literal seating. | The hypothesis is in English instead of Chinese (target language is zh). It also mistranslates '车位' (shared subscription slot) as 'seat' and '小号' (alt account) as 'small seat'. | Mistranslation of slang '小号' (alt account/burner account) as 'small seat'. '车位' in the title (shared seat/spot) is also not reflected in the translation structure, though the hypothesis only covers content. The phrase 'chat privately and send me the details' is slightly verbose but acceptable; the main error is 'small seat'. | "小号" means alt account; rendering it as "small seat" confuses with seat licensing and is incorrect. |
| google/gemma-3-4b-it | Title: ChatGPT Shared Slot *4 Content: After winning, please private message me to get it. It’s newly opened and should be stable, but I make no promises. Suggest using a small account to sign up. |
轻微
[术语]
"Team 车位*4 -> Shared Slot *4"
理由: “车位” in this context is more naturally ‘seats’ or ‘spots’ in a shared subscription rather than generic ‘slot’. | Mistranslation: '车位' (shared seat/slot in subscription) was incorrectly translated as 'slot' but contextually should be 'shared seat' or 'seat' | Hypothesis translates '车位' as 'Shared Slot' but reference correctly translates it as 'Shared Seats', which is more accurate for subscription sharing context | '车位' in this context is slang for 'seat' in a shared subscription. 'Slot' is acceptable but 'Seat' is more precise for 'Team' plans.
轻微
[术语]
"小号上车 -> small account to sign up"
理由: '小号' is better translated as 'alt account' or 'throwaway account' in this context. | Mistranslation: '上车' (join the shared subscription) was incorrectly translated as 'sign up' instead of 'join the shared subscription' or 'get on board' | “上车” is idiomatic for ‘join the shared subscription’; ‘sign up’ loses the group-buying nuance. |
| google/translategemma-12b-it | Title: ChatGPT Team Slots *4 | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: ChatGPT Team Slots *4 Content: After winning, please DM me to get one. Newly opened, should be stable, but no promises made. Recommend using a small account. |
轻微
[术语]
"Slots"
理由: Should be 'Shared Seats' not 'Slots'. The Chinese '车位' refers to shared subscription seats, not generic slots. | '车位' in this context refers to shared subscription seats; 'Slots' is acceptable but 'Shared Seats' is more precise per reference. | Should be 'Shared Seats' |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
严重
[准确性]
"Translation of '登录' in en"
理由: The hypothesis translates the wrong text ('登录' - Login) which does not appear in the source, and completely ignores the actual source content about ChatGPT Team seats. | The translation does not match the source content. | Hypothesis is placeholder text, not actual translation; completely unrelated to source content |
| tencent/HY-MT1.5-7B | ChatGPT Team Car Seats *4 | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is this proof that GPT-5.4 is shadowbanned? Gents |
严重
[准确性]
"Is this proof that GPT-5.4 is shadowbanned?"
理由: Should include 'downgraded' as in the reference. | Mistranslation of '降智' (intelligence downgrade/dumbed down) as 'shadowbanned'. These are different concepts. | 原文“降智”指變笨/降級,參考中是“shadowbanned/downgraded”,而譯文只保留“shadowbanned”含義,丟失“智力下降/降級”維度。 | The source says '降智' which means 'downgraded/shadowbanned', but the hypothesis only captures 'shadowbanned' without the full context. Also '佬' (bros) is missing. | Incorrect translation: '降智' means 'being downgraded/disabled' or 'dumbed down', not 'shadowbanned'. '佬' means 'bro' (colloquial for experienced user), not 'Gents'.
轻微
[风格]
"Gents"
理由: '佬' is informal slang; 'Gents' is too formal. 'Bros' or 'Guys' is appropriate. | The tone is too formal; 'Gents' doesn't capture the colloquial '佬'. | 原文“各位佬”帶有網絡兄弟語氣,“Gents”偏正式且風格不符,可用“bros/guys”等。 |
| CohereLabs/tiny-aya-water | Title: Is this proof that GPT-5.4 is shadowbanned? Guys |
严重
[准确性]
"Is this proof that GPT-5.4 is shadowbanned?"
理由: Incorrect translation of '降智' - should be 'downgraded/shadowbanned' but the hypothesis uses only 'shadowbanned' which is incomplete. Also '降智' more accurately means 'intelligence degradation' or 'downgraded' | "降智" means dumbing down/degradation in capability, not shadowban. The reference keeps both possibilities; hypothesis collapses to only shadowban, losing the downgrade meaning. | The source '降智' literally means 'intelligence reduction' or 'lobotomized' (model performance downgrade). While it can imply a shadowban, translating it directly as 'shadowbanned' loses the nuance of performance degradation unless context confirms it. However, the bigger issue is the omission of the Content translation. |
| Qwen/Qwen2.5-14B-Instruct | Title: Does this indicate that GPT5.4 has been nerfed? Fellow experts Content: Fellows, does this indicate that it has been nerfed? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Does this indicate GPT5.4 has been nerfed? Fellow experts Content: Fellow experts, does this indicate a downgrade? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Questo indica che GPT5.4 è stato depotenziato? |
致命
[准确性]
"depotenziato"
理由: The hypothesis translates only the title and omits the content. It also fails to translate the vocative '各位佬' (Bros) present in the source title. | Translation is in Italian instead of target language Chinese (zh). Complete language mismatch. | "降智" in questo contesto è più vicino a "downgraded/nerfato/meno intelligente"; "depotenziato" è accettabile ma non rende chiaramente la connotazione di riduzione dell’intelligenza. |
| Qwen/Qwen3.5-4B | Title: Does this prove GPT5.4 has been nerfed? Folks, Content: Folks, does this indicate degradation? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Does this mean GPT 5.4 has been nerfed? Everyone. Content: Everyone, does this indicate a downgrade? |
轻微
[术语]
"has been nerfed"
理由: '降智' literally means 'intelligence reduced'. 'Nerfed' is a good gaming slang equivalent, though 'downgraded' is more literal. Acceptable. | '降智' means 'shadowbanned/downgraded', not 'nerfed'. Reference correctly uses 'shadowbanned/downgraded' | “降智” here is closer to “dumbed down” or “downgraded in intelligence”; “nerfed” adds a gaming-balance nuance not explicit in source. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Is this showing that GPT 5.4 is shadowbanned? Guys |
严重
[准确性]
"Is this showing that GPT 5.4 is shadowbanned?"
理由: Mistranslation of '降智' (literally 'intelligence reduction', slang for model performance degradation/dumbification). 'Shadowbanned' refers to visibility restriction, which is a different concept. | Mistranslates '降智' as 'shadowbanned'. Reference shows it should be 'shadowbanned/downgraded' but the primary meaning is capability degradation, not shadowban. | “降智” means downgraded/degraded intelligence, not ‘shadowbanned’; misses the nuance present in the reference. |
| google/gemma-3-12b-it | Title: Does this indicate that GPT-5.4 has been nerfed? Experts, Content: Experts, does this indicate a downgrade? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '这能说明GPT5.4 降智了吗?各位佬' in en |
致命
[准确性]
"Translation of '这能说明GPT5.4 降智了吗?各位佬' in en"
理由: No meaningful translation provided. | The hypothesis is not a proper translation; it indicates it is a translation to English but the target language is Chinese (zh). Also, the content is missing entirely. | Hypothesis does not provide actual translation, only states 'Translation of...' without delivering the translation. | The hypothesis contains only meta-text describing the task instead of providing the actual translation. The term '降智' (downgraded intelligence/shadowban) and '佬' (bros/experts) are not translated. | No translation content is given; it only describes that a translation should be provided. |
| google/gemma-3-4b-it | Can this explain whether GPT 5.4 has been downgraded? Fellow experts. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Does this indicate that GPT-5.4 has been nerfed? Experts, please advise. Content: Experts, does this indicate a downgrade? |
轻微
[术语]
"Has been nerfed"
理由: Should be '影禁/降级' instead of 'nerfed' | '降智' is often translated as 'downgraded' or 'lobotomized' in this context; 'nerfed' is close but slightly different gaming slang. 'Experts' for '佬' misses the community slang 'Bros'. Missing Content translation. | Original nuance is about downgrade/‘降智’ and possibly shadowban; ‘nerfed’ is colloquial game slang and may slightly shift meaning, though broadly similar. | Should be 'shadowbanned/downgraded' not 'nerfed'. The reference uses 'shadowbanned/downgraded' which better captures '降智'. |
| google/translategemma-4b-it | Can this explain GPT5.4 being 'shadowbanned'? |
严重
[准确性]
"being 'shadowbanned'?"
理由: '降智' literally means 'intelligence reduction' or 'downgraded performance'. 'Shadowbanned' is a different concept, though related in outcome, it changes the specific meaning. | Misinterpretation of '降智' as 'shadowbanned' | The source says '降智' which means 'downgraded/shadowbanned' but the hypothesis only mentions shadowbanned. Also, the title should reference GPT-5.4 not GPT5.4, and should include 'Bros' as in the reference. | Source “降智” means nerfed/dumbed-down, possibly downgraded; “shadowbanned” changes the type of issue from quality downgrade to access restriction |
| tencent/HY-MT1.5-1.8B | Translation of '这能说明GPT5.4 降智了吗?各位佬' in en |
严重
[准确性]
"Translation of '这能说明 GPT5.4 降智了吗?各位佬' in en"
理由: The hypothesis is a meta-string describing the task rather than providing the translation. The content is missing. | Hypothesis is placeholder text, not actual translation | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | Title: Does This Indicate That GPT5.4 Has Been Downgraded? Guys | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【NihaoAPI】Free API Site Launch! Content: The models include GPT-1 family, Claude, GML, Minimax - not guaranteed to be available. Users mainly come here to test Codex. Supports LinuxD0 registration (we learned a lot here with users and it's our way of giving back). However, server performance is not great, so if we get too many registrations, we might have to close them down. Come on, guys! |
严重
[准确性]
"GPT-1 family"
理由: Multiple errors: 'gpt 一家' means 'GPT family', not 'GPT-1 family'. 'glm' should be 'GLM' (not capitalized incorrectly). '蹬' is slang for 'use heavily/spam', not 'test'. 'linuxdo' is a forum name, not 'LinuxD0' (zero). '佬友' means 'bros', not 'users'. '上车' means 'join/get on board (the subscription)', not literal 'come on'. | 原文“gpt 一家”指GPT系列,不是GPT-1;譯為“GPT-1 family”嚴重誤導。 | Should be 'GPT family' not 'GPT-1 family'. The source says 'gpt 一家' meaning the GPT family of models. | Mistranslation of 'gpt 一家' (the GPT family/provider) as 'GPT-1', implying a specific version.
严重
[准确性]
"GML"
理由: Should be 'GLM' not 'GML'. This is a specific model name. | Incorrect acronym for 'glm' (GLM). | 將“glm”錯拼為“GML”,屬專名誤譯。
轻微
[准确性]
"Users mainly come here to test Codex"
理由: '蹬' implies heavy usage/milking resources, 'test' is too mild. | Should be 'heavily milk/spam Codex' not 'test Codex'. The reference indicates heavy usage/exploitation. | “蹬 codex”更接近“薅/白嫖/猛用Codex”,不僅是“測試”,語氣偏戲謔。
严重
[准确性]
"LinuxD0"
理由: 將“linuxdo”誤寫為“LinuxD0”(字母O變數字0),屬專名錯誤。 | Typo in proper noun 'LINUX DO'. | Should be 'LINUX DO' not 'LinuxD0'. This is the name of the forum/community. |
| CohereLabs/tiny-aya-water | Title: 【NihaoAPI】Free API Station Opened! Content: The models are mainly GPT, Claude, GLM, and Minimax, but availability is not guaranteed. Users primarily visit here to test Codex. Registration is currently only supported on LinuxDo (we learned a lot from you guys here, and it's also our way to give back). However, server performance isn't very good, so if we receive too much traffic, we might have to close registration. Come and join us! | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [HelloAPI] Free Shared API Site is Now Open! Content: The main model is GPT; Claude, GLM, and Minimax availability cannot be guaranteed. Come check out Codex! Registration only supported on Linux (I've learned so much from you all, this is my way of giving back). However, server performance isn't great, so if there's too much traffic we might have to close registration. Get onboard quickly! |
轻微
[术语]
"Free Shared API Site"
理由: Should be 'Free API Proxy Site' to match reference | "公益站" is closer to public-benefit/free site or free proxy site; "Shared API" adds nuance not explicit in source. | '公益站' is better translated as 'Free API proxy site' or 'Charity station' in this context, though 'Free Shared' is close.
严重
[准确性]
"Registration only supported on Linux"
理由: Reference uses 'only supports registration via LINUX DO' - 'Linux' is incorrect, should be 'LinuxDO' (the forum name) | Source: "仅支持 linuxdo 注册" (only supports LinuxDo registration). Hypothesis changes it to OS-level "Linux" support, which is incorrect. | Mistranslation of 'linuxdo'. 'Linuxdo' is the name of the forum/community, not the operating system 'Linux'. The hypothesis says 'Registration only supported on Linux', which is factually incorrect based on the source. |
| Qwen/Qwen2.5-7B-Instruct | Title: [NihaoAPI] Free API Site is Open!~ Content: The main model is GPT; Claude, GLM, and Minimax are not guaranteed to be available. Loyal friends mainly come to ride Codex. Only LinuxDo registration is supported (we learned a lot from our loyal friends here, so we're giving back). However, server performance is not very good, and if there's too much enthusiasm later on, registration may be closed. Loyal friends, get on board quickly! |
严重
[准确性]
"Loyal friends mainly come to ride Codex."
理由: Mistranslation of '佬友们' - should be 'Bros' not 'Loyal friends' | '蹬 codex' in the community slang means using/abusing Codex heavily; 'ride Codex' is unclear and misses this meaning. | Mistranslation: '来蹬' means 'to use/heavily use' (like milking), not 'ride'. Reference correctly says 'heavily milk/spam Codex' |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: The hypothesis contains malformed text and fails to provide any translation of the source content. | Hypothesis contains only placeholder text '...}' instead of actual translation. No valid translation provided. | Incomplete translation. |
| Qwen/Qwen3.5-4B | Title: [NihaoAPI] Free Shared API Site is Open!~ Content: The models are mainly from GPT; Claude, GLM, and Minimax availability is not guaranteed. Come primarily to test Codex here. Registration via LinuxDo only (I learned a lot here and hope to give back to everyone). However, server performance is not great. If too many people sign up later, registration might be closed. Hurry up and join! |
轻微
[术语]
"Free Shared API Site"
理由: '公益站' is often translated as 'Free API proxy site' or 'Charity site' in this niche; 'Shared' adds an assumption. | Should be 'Free API Proxy Site' for accuracy | Should be 'Free API Proxy Site' as per reference. | Source “公益站” is closer to “public / charitable/free site”; “shared API” injects “API” not in source, though context implies it |
| Qwen/Qwen3.5-9B | Title: [NihaoAPI] Free Shared API Site is Open! Content: The models are mainly from GPT; availability of Claude, GLM, and Minimax is not guaranteed. Friends here mostly come to use Codex. Only supports LinuxDo registration (learned a lot from friends here, also a way to give back). However, server performance isn't very good; if too many people sign up later, registration might be closed. Friends, hurry up and join! | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: "【NihaoAPI】Free Shared API Site is Now Open!" Content: The model is mainly GPT family, Claude, GLM, Minimax not guaranteed to be available, friends mainly come to hit Codex. Only supports Linux registration (here we have learned a lot from you guys, which also counts as feedback) However, server performance isn't very good, if it gets too popular, we might close registration. Friends, hop on quickly! |
严重
[准确性]
"Only supports Linux registration"
理由: Translates 'linuxdo注册' as 'Linux registration' but should be 'LINUX DO registration' referring to the specific forum/community. | Misinterpretation of 'linuxdo'. It refers to the specific forum 'LINUX DO', not the operating system Linux. | Source says only supports registration via LINUX DO forum, not generic Linux OS registration; meaning is changed. |
| google/gemma-3-12b-it | Title: [NihaoAPI] Public Proxy Site is Now Open!\nContent: The models mainly include those from the GPT family. Claude, GLM, and Minimax availability are not guaranteed. Experienced users primarily come to use Codex. Only supports LinuxDo registration (I’ve learned a lot here with experienced users, and this is also a way to give back to everyone). However, server performance isn’t very good, so registration may be closed if there’s too much enthusiasm. Experienced users, hop on quickly! |
严重
[准确性]
"蹬 codex"
理由: '蹬' (deng) is slang, possibly meaning to 'kick' (use heavily/abuse) or just 'use'. The reference interprets it as 'milk/spam'. 'Use' is safe but misses the slang flavor. | Translated as “use Codex” instead of slang meaning “heavily milk/spam Codex”, losing intended tone and degree. | Mistranslation of '蹬' (use heavily/spam) as 'use Codex' - reference uses 'heavily milk/spam Codex' which is more accurate |
| google/gemma-3-1b-it | Title: 【NihaoAPI】公益站开张啦~ |
致命
[准确性]
"Title: 【NihaoAPI】公益站开张啦~"
理由: The hypothesis only contains the untranslated source title. The entire content paragraph is missing. No translation was performed. | Hypothesis only repeats the source title without providing translation. Reference shows proper translation as '[NihaoAPI] Free API Proxy Site is Open~'. | Only the title is given; the body content about models, registration limits, server performance, and call to action is completely omitted. | The hypothesis only provides the title translation but is missing the content translation entirely. | No content translation provided. |
| google/gemma-3-4b-it | Title: [NihaoAPI] Free Shared API Site Opening! | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [NihaoAPI] Public Proxy Site is Now Open! Content: The models mainly include those from the GPT family. Claude, GLM, and Minimax availability are not guaranteed. Experienced users can primarily use Codex. Only supports LinuxDo registration (I learned a lot here with experienced users, and this is also a way to give back to everyone). However, the server performance is not very good, so if there is too much enthusiasm, registration may be closed. Experienced users, please hurry up and join! |
轻微
[术语]
"Public Proxy Site"
理由: '公益站' implies 'Free/Charity' site, 'Public' is less precise. 'Experienced users' for '佬友' misses the specific community slang 'Bros'. 'heavily milk/spam' vs 'use' changes the tone significantly. | Should be '免费API代理站点' instead of 'Public Proxy Site' | Should be 'Free API Proxy Site' not 'Public Proxy Site'. The source indicates '公益站' (free/public welfare site) providing API services. |
| google/translategemma-4b-it | Title: 【NihaoAPI】Free Shared API Site is Now Open! | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 【NihaoAPI】公益站开张啦~' in en |
严重
[准确性]
"Translation of 'Title: 【NihaoAPI】公益站开张啦~' in en"
理由: Hypothesis is placeholder text, not actual translation; only translates title, ignores content | The translation only provides the title and not the content. | The hypothesis only attempts to describe the translation of the title (and fails to even do that correctly by outputting a meta-string), completely omitting the translation of the main content. |
| tencent/HY-MT1.5-7B | Title: [NihaoAPI] The Public Welfare Site is Open! | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: How can I set up my own Codex token pool? Content: The reason is that the Codex I bought requires 2 cents per token, but I saw friends with much cheaper ones and they seem real. In addition, I like to try things out, so I wanted to ask how you guys set up your own Codex token pool. From registering on machines to automatically switching accounts, as well as the ban policy, I'm looking for any open-source projects that support this. |
严重
[准确性]
"token pool"
理由: Incorrect translation: '号池' means 'account pool', not 'token pool'. '2毛一刀' means '0.2 RMB per dollar' (pricing), not '2 cents per token'. '注册机' means 'account generator/bot', not 'registering on machines'. '佬友' means 'bros', not 'you guys'. | 原文“号池”是賬號池,不是token池;錯將計費單位與賬號概念混淆。 | Mistranslation of '号池' (account pool) as 'token pool'. In this context, '号' refers to user accounts, not API tokens. | Should be 'account pool' not 'token pool'. The source discusses account management, not token management.
严重
[准确性]
"requires 2 cents per token"
理由: “2 毛一刀”是0.2元/美元,不是每token兩美分,計價方式和數值均錯。 | Mistranslation of '2 毛一刀'. '2 毛' is 0.2 RMB, and '一刀' is slang for 1 USD. The source means 0.2 RMB per 1 USD of usage, whereas the translation incorrectly states 2 cents per token. | Should be '0.2 RMB per dollar' not '2 cents per token'. The source refers to pricing in RMB currency. | Incorrect translation of the cost from 0.2 RMB per token to 2 cents per token. |
| CohereLabs/tiny-aya-water | Title: How can I set up my own Codex account pool? Content: The reason is that the Codex I bought needs 2 cents per account, but I saw friends with much cheaper ones and they seem real. Plus, I'm quite fond of trying new things, so I wanted to ask the guys how to set up their own Codex account pool. From registering through the registration machine to automatically switching accounts by polling, plus the ban policy, I've been asking around if there are any open-source projects already available. |
严重
[准确性]
"needs 2 cents per account"
理由: "2 毛一刀" is 0.2 RMB per USD of credit, not 2 cents per account. Pricing unit and meaning are wrong. | Mistranslation of '2 毛一刀'. '2 毛' refers to 0.2 RMB, and '一刀' is slang for one US dollar (or one unit of service costing $1). The source means the cost is 0.2 RMB per dollar of credit/service, not 2 cents per account. | Mistranslation of '2 毛一刀' - should be '0.2 RMB per dollar' not '2 cents per account' |
| Qwen/Qwen2.5-14B-Instruct | Title: Friends, how do you build your own Codex account pool? Content: The reason is that I bought Codex at two cents per use, but my friend's seems much cheaper and appears to be genuine. Since I like trying new things, I want to ask my friends how to set up a Codex account pool. From registration through automatic polling and switching accounts, including anti-ban policies, I would like to know if there are already open-source projects available? |
严重
[准确性]
"two cents per use"
理由: "2 毛一刀" is 0.2 RMB per USD, not two cents per use. Monetary unit and basis are wrong. | Mistranslation of '2 毛一刀'. '一刀' usually means 'one dollar' worth of usage or 'one cut/use'. '2 毛' is 0.2 RMB. The reference interprets it as '0.2 RMB per dollar'. 'Two cents per use' might be numerically close but misses the 'per dollar of credit' nuance if that's the pricing model, or simply the currency unit. | Should be '0.2 RMB per dollar' not 'two cents per use' - misses the meaning | Reference uses 'costs 0.2 RMB per dollar' - 'two cents per use' is incorrect translation of '2毛一刀' |
| Qwen/Qwen2.5-7B-Instruct | Old friends, how do you set up your own Codex account pool? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Hypothesis contains only placeholder text '...}' instead of actual translation. No valid translation provided. | Incomplete translation. | The hypothesis contains no translated text, only malformed JSON fragments and an HTML tag, resulting in a complete failure to convey the source meaning. |
| Qwen/Qwen3.5-4B | Title: Folks, how do I set up my own Codex token pool? Content: The reason is that the Codex I bought costs $0.20 per kill, but looking at friends' accounts, they seem very cheap and appear to be legitimate. Plus, I love trying new things, so I'd like to ask the community members here: How can I build a Codex token pool myself? From registering via registration machines to automatically polling and switching accounts, including policies on account bans—I want to know if there are any existing open-source projects supporting this? |
致命
[准确性]
"0.20 per kill"
理由: Should be '0.2 RMB per dollar' not '$0.20 per kill' | Mistranslation of '2 毛一刀'. In this context, '一刀' is slang for a transaction or usage instance, not 'kill'. Also, the currency is likely RMB (0.2 RMB), not USD ($0.20), drastically altering the cost implication. | “2 毛一刀” means 0.2 RMB per USD; “per kill” is a severe mistranslation |
| Qwen/Qwen3.5-9B | Title: Friends, how do I build my own Codex account pool? Content: The reason is that the Codex I bought costs 20 cents per request, but seeing friends use it very cheaply and seemingly authentic. Given my love for trying new things, I'd like to ask experienced users how to set up a Codex account pool myself? From registration machines to automatic polling and switching accounts, including handling ban policies - are there any ready-made open-source projects available? |
严重
[准确性]
"20 cents per request"
理由: '2毛一刀' means '0.2 RMB per dollar', not '20 cents per request'. Misinterprets the pricing model | Misreads “2 毛一刀” (0.2 RMB per USD of credit); translated as per request rather than per dollar/amount, changing the billing unit. | The source '2 毛一刀' refers to 0.2 RMB per usage/cut. '20 cents' implies USD without context, and 'per request' is an interpretation not explicitly in the source, though likely intended. The currency ambiguity is a significant accuracy issue. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Friends, how to build your own Codex pool? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Hey folks, how do I build my own Codex account pool? | 通过 (无共识错误) |
| google/gemma-3-1b-it | The reason is that I bought Codex and it costs 2 yuan per token. However, my friend seems to be very cheap, and it seems to be real. Adding the policy on registration, switching accounts, and auto polling, I want to ask the佬友们 how to build a Codex token pool myself. From registering, to automatically switch accounts, and adding the blacklist policy, I want to know if there are any existing open-source projects that support this? |
严重
[准确性]
"2 yuan per token"
理由: Mistranslation of cost and addition of irrelevant policy details. | Mistranslation of '2 毛一刀'. '2 毛' is 0.2 RMB, not 2 yuan. '一刀' in this context likely refers to a unit of usage or dollar equivalent, not 'token'. The reference correctly identifies it as '0.2 RMB per dollar'. | The hypothesis is in English instead of Chinese (target language is zh). It also contains untranslated Chinese '佬友们' and mistranslations: '号池' (account pool) as 'token pool', '封号政策' (ban policy) as 'blacklist policy'. | Source says "2毛一刀" (~0.2 RMB per USD); "2 yuan per token" is a large quantitative and unit error. | '2毛一刀' means '0.2 RMB per dollar' not '2 yuan per token'. Misinterprets the pricing unit. |
| google/gemma-3-4b-it | Title: Fellow players, how do I set up my own Codex account pool? | 通过 (无共识错误) |
| google/translategemma-12b-it | Hey everyone, how can I build my own Codex account pool? |
轻微
[风格]
"Hey everyone, how can I build my own Codex account pool?"
理由: The translation uses 'everyone' instead of the more colloquial 'bros', which slightly alters the tone. | Accurate meaning-wise but drops slang “佬友们” (bros) and explanatory context about costs and automation; stylistic/community flavor loss. | '佬友们' is specific community slang ('Bros'), 'Hey everyone' is too generic. Missed translating the entire Content section. |
| google/translategemma-4b-it | Title: Fellow users, how can I set up my own Codex account pool? |
轻微
[风格]
"Fellow users"
理由: The translation 'Fellow users' fails to capture the informal, community-specific slang tone of '佬友们' (lǎo yǒu men), which is better rendered as 'Bros' or 'Fellow experts' in this context. The reference 'Bros' aligns better with the source's register. | Loses in-group slang tone of “佬友们 / bros”; still understandable but less colloquial-matching | Should be 'Bros' |
| tencent/HY-MT1.5-1.8B | Can you give me some tips on how to build your own Codex account pools, friends? |
严重
[准确性]
"Can you give me some tips on how to build your own Codex account pools, friends?"
理由: Hypothesis is paraphrased response rather than translation of source content; missing key details about cost comparison, ban policies, and open-source projects | The hypothesis translates only the title (and loosely at that), completely omitting the entire content paragraph which details the user's specific situation and questions. | The translation does not match the source content. |
| tencent/HY-MT1.5-7B | "en": "How can friends build their own Codex number pools?", "zh-TW": "朋友们如何自建 Codex 号池呢?」 , "ja": "皆さん、どのようにして自分で Codex 号池を構築でればいいのでしょうか?", "ko": "여러분, 어떻게 하면 자신만의 Codex 토큰 풀을 만들 수 있을까요?", "de": "Freunde, wie können Sie Ihre eigenen Codex-Zahlungspools erstellen?", "fr": "Comment les amis peuvent-ils créer leurs propres pools de codes Codex ?", "it": "Amici, come possono creare i propri pool di codici Codex?"}|im_end|> |
严重
[准确性]
""en": "How can friends build their own Codex number pools?""
理由: The hypothesis is a raw JSON dump of multiple languages, not a Chinese translation. It fails to translate the source. | “佬友们,如何自建 Codex 号池呀?” is closer to “Bros, how do I build my own Codex account pool?”; “friends” and “number pools” weaken slang and ‘account’ nuance. | Does not match the reference content. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 100 Codex Prunings, 25 per set |
致命
[准确性]
"100 Codex Prunings"
理由: “普号”是“普通賬號/標準賬號”,錯譯為“Prunings”完全改變含義。 | Should be 'standard accounts' instead of 'Prunings'. | Incorrect translation: '普号' means 'standard/free account', not 'Prunings'. The translation is nonsensical. | Severe mistranslation of '普号' (standard/regular accounts) as 'Prunings'. | Should be 'Codex standard accounts' not 'Codex Prunings'. '普号' means standard/free accounts, not prunings. |
| CohereLabs/tiny-aya-water | Share your thoughts on the new features and improvements. |
致命
[准确性]
"Share your thoughts on the new features and improvements."
理由: Completely unrelated content - hypothesis is entirely different from source and reference. Should be about sharing 100 Codex accounts | Completely unrelated to the source. The original is about sharing 100 Codex standard accounts for testing; none of that is reflected. | Complete hallucination. The hypothesis bears no resemblance to the source text which discusses sharing 100 Codex accounts. |
| Qwen/Qwen2.5-14B-Instruct | Title: 100 codex standard accounts, 25 per group Content: Sharing 100 Codex standard accounts with everyone. Please test and let me know if there are any issues. |
轻微
[流畅性]
"Sharing 100 Codex standard accounts with everyone. Please test and let me know if there are any issues."
理由: Misses the nuance of '给我指正' (correct me/point out errors). 'Let me know' is a bit weak. | Missing 'with you bros' and 'to correct them' - incomplete translation of the content | Reference uses 'Sharing 100 Codex standard/free accounts with you bros' - includes 'free' and 'you bros' |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: 100 Codex accounts, 25 per group\nContent: Sharing 100 Codex accounts for everyone to test. Please let me know if there are any issues.", "zh-TW": "標題:100個Codex帳號,每組25個\n內容:分享100個Codex帳號供各位測試,如有問題請告知。", "ja": "タイトル:100のCodexアカウント、1組25個\n内容:100のCodexアカウントをシェアしますので、テストして不具合があれば教えてください。", "ko": "제목: 100개의 Codex 계정, 25개씩 그룹화\n내용: 모든 분들이 테스트해 보실 수 있도록 100개의 Codex 계정을 공유합니다. 문제가 있으면 알려주세요.", "de": "Titel: 100 Codex-Konten, 25 pro Gruppe\nInhalt: Teile 100 Codex-Konten mit, damit alle testen können. Wenn es Probleme gibt, melden Sie sie bitte.", "fr": "Titre : 100 comptes Codex, 25 par groupe\nContenu : Partagez 100 comptes Codex avec tous pour tester. Veuillez me signaler tout problème si vous en avez.", "it": "Titolo: 100 account Codex, 25 per gruppo\nContenuto: Condivido 100 account Codex con tutti per testarli. Se c\'è qualche problema, fatevene notizia." } | 通过 (无共识错误) |
| Qwen/Qwen3-14B | 100 Codex Normal Accounts, 25 per Group | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: 100 common Codex items, 25 per group Content: Sharing 100 common Codex items with everyone for testing. Please let me know if you find any issues. |
严重
[准确性]
"100 common Codex items"
理由: Mistranslation of '普号' (standard/free accounts) as 'items'. These are user accounts, not physical items. | “codex普号” refers to standard/normal accounts, not “items” | Should be 'Codex standard/free accounts' not 'common Codex items' |
| Qwen/Qwen3.5-9B | Title: 100 codex common numbers, 25 per group Content: Sharing 100 Codex common numbers with everyone for testing to see if there are any issues; please correct me if you find problems. |
严重
[术语]
"common numbers"
理由: “普号” in this context means standard/regular accounts, not “numbers”. This misleads the object type (accounts vs numbers). | '普号' means 'standard/free accounts', not 'common numbers'. Should be 'standard accounts' | '普号' means 'standard accounts' or 'regular accounts'. 'Numbers' is a mistranslation of '号' (account/number) in this context; they are accounts, not phone numbers. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: 100 Codex Public Numbers, 25 at a Time Content: Sharing 100 Codex public numbers with everyone, test them and let me know if there's any issue. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: 100 Codex generic codes, 25 per group Content: Sharing 100 Codex generic codes for everyone to test and see if there are any issues. Please point out any problems. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Share 100 Codex PIDs with the community. |
严重
[准确性]
"Share 100 Codex PIDs"
理由: Mistranslation of '普号' (standard/free accounts) as 'PIDs'. The hypothesis omits the request for testing and feedback ('大家测试一下看看有没有问题...'). The title is also missing. | The hypothesis is in English instead of Chinese (target language is zh). It also mistranslates '普号' (standard/free accounts) as 'PIDs' and is missing the content translation. | Incomplete translation. Missing key information about 'standard accounts' and 'in groups of 25'. 'PIDs' is incorrect terminology; should be 'accounts'. | "普号" means standard/basic accounts, not "PIDs"; also omits grouping of 25 per group and the request to test and report issues. | Missing key information about testing and feedback. |
| google/gemma-3-4b-it | Title: 100 Codex P numbers, 25 sets per group Content: Sharing 100 Codex P numbers to all seniors, please test and let me know if there are any issues, please point them out to me if you find any. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: 100 Codex generic codes, 25 per group Content: Sharing 100 Codex generic codes for everyone to test and see if there are any issues. Please let me know if you find any problems. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: 100 Codex "P" numbers, grouped into sets of 25 |
严重
[准确性]
""P" numbers"
理由: Complete mistranslation of '普号' (standard/regular accounts). 'P numbers' is nonsensical. | Should be 'Codex standard accounts' not 'Codex "P" numbers'. The Chinese '普号' means standard/free accounts, not 'P numbers'. | “普号” means standard/regular accounts, not letter P; misleads about meaning of account type |
| tencent/HY-MT1.5-1.8B | Login |
轻微
[准确性]
"Login"
理由: Hypothesis is completely unrelated to source content; source is about 100 Codex accounts, hypothesis says Login | Single word 'Login' is unrelated to the source about sharing 100 Codex accounts; completely incorrect meaning. | The translation does not match the source content. | The hypothesis outputs 'Login', which is completely unrelated to the source text about sharing Codex accounts. It appears to be a hallucination or a retrieval of a wrong label. |
| tencent/HY-MT1.5-7B | Title: 100 Codex Universal Numbers, grouped in sets of 25 | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: A little chicken is currently only using a ladder, feels like there is still room for deployment, wondering what projects are worth deploying |
严重
[准确性]
"A little chicken"
理由: Should be 'I have a small VPS that is currently only running a VPN/proxy'. '小小鸡' is slang for small VPS, not literal chicken, and '梯子' means VPN/proxy, not ladder. | “小小鸡”為對低配小雞VPS的戲稱,直接譯為“小雞”會被理解為動物而非伺服器。 | Incorrect translation: '小小鸡' is slang for 'small VPS', not 'little chicken'. '搭梯子' means 'use VPN/proxy', not literal 'using a ladder'. | Mistranslation of slang '小小鸡' (small VPS/server) as a literal animal. |
| CohereLabs/tiny-aya-water | Title: A little chicken is currently only using a ladder, feels like there is still room for deployment, wondering what projects are worth deploying |
严重
[准确性]
"A little chicken is currently only using a ladder, feels like there is still room for deployment, wondering what projects are worth deploying"
理由: Literal translation is confusing - should be 'I have a small VPS that is currently only running a VPN/proxy' not 'A little chicken is currently only using a ladder' | "小小鸡" is slang for a small VPS/server and "搭梯子" for running a VPN/proxy. Literal rendering as "little chicken" and "ladder" is misleading and not idiomatic English for the intended meaning. | Mistranslation of slang '小小鸡'. In this context, it refers to a 'small VPS' (virtual private server), not a literal chicken. |
| Qwen/Qwen2.5-14B-Instruct | Title: The cheap VPS currently only runs a proxy, feels like there's still capacity left, wondering what projects are worth deploying at the moment Content: The cheap VPS currently only runs a proxy, feels like there's still capacity left, wondering what projects are worth deploying at the moment | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | A small chicken currently only搭建梯子,感觉还有余力,想问问目前有什么值得部署的项目Translation of '有只小小鸡目前只搭梯子,感觉还有余力,想问问目前有什么值得部署的项目' in en | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: I have a cheap VPS that is currently only used for tunneling. I feel there's still some capacity left, so I want to ask if there are any worthwhile projects to deploy right now Content: I have a cheap VPS that is currently only used for tunneling. I feel there's still some capacity left, so I want to ask if there are any worthwhile projects to deploy right now | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: I have a cheap VPS that is currently only running on Tor, feels like there's still some capacity left. Could anyone recommend any projects worth deploying? |
严重
[准确性]
"running on Tor"
理由: Severe hallucination/mistranslation. '搭梯子' means 'setting up a proxy/VPN', not 'running on Tor'. | Should be 'running a VPN/proxy' not 'running on Tor' | “只搭梯子” means only running a VPN/proxy; Tor is a specific technology not mentioned and changes the meaning |
| Qwen/Qwen3.5-9B | Title: I have a cheap VPS currently only running proxies, feeling there's still capacity left; wondering what projects would be worth deploying Content: I have a cheap VPS currently only running proxies, feeling there's still capacity left; wondering what projects would be worth deploying | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: There's just a small cheap VPS right now only setting up a ladder, feels like there's still some strength left, wants to ask what projects are worth deploying now Content: There's just a small cheap VPS right now only setting up a ladder, feels like there's still some strength left, wants to ask what projects are worth deploying now | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: A cheap VPS is currently just getting started and feels like there's still room for improvement, wondering what projects are worth deploying right now. Content: A cheap VPS is currently just getting started and feels like there's still room for improvement, wondering what projects are worth deploying right now. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The title is "A small little chicken currently only uses a ladder, and I feel there's still room for improvement, so I want to ask what worthwhile projects are currently available. |
严重
[准确性]
"A small little chicken"
理由: "小小鸡" is slang for a small VPS/server, not a literal chicken. | No meaningful translation provided. | The hypothesis is in English instead of Chinese (target language is zh). It also mistranslates '梯子' (VPN/proxy) as 'ladder' and '小小鸡' (VPS) as 'small little chicken'. | '小小鸡' is slang for 'small VPS' not 'small little chicken'. '搭梯子' means 'running a VPN/proxy' not 'uses a ladder'. | Includes meta-text ('The title is'). Mistranslates slang: '小小鸡' (small VPS) as 'little chicken', '搭梯子' (set up a proxy/VPN) as 'uses a ladder'. '余力' (spare capacity) is mistranslated as 'room for improvement'. |
| google/gemma-3-4b-it | Title: There’s a small VPS currently just using a ladder, feeling like there’s still some leeway, wondering what projects are worth deploying right now. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: A small VPS is currently only using minimal resources and seems to have capacity left. I would like to ask what projects are worth deploying. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: I have a small VPS currently only using a ladder, feeling like there is still room for more, would like to know what projects are worth deploying now? |
轻微
[术语]
"ladder"
理由: Should be 'running a VPN/proxy' not 'using a ladder'. The Chinese '搭梯子' is a colloquial term for setting up a VPN/proxy, not literally a ladder. | Calque from Chinese “梯子”; should be rendered as VPN/proxy for clarity in English | Should be 'VPN/proxy' |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '有只小小鸡目前只搭梯子,感觉还有余力,想问问目前有什么值得部署的项目' in en", "zh-TW": "Translation of '有只小小鸡目前只搭梯子,感覺還有餘力,想問問目前有什麼值得部署的項目' in zh-TW", "ja": "Translation of '有只小小鸡目前只搭梯子,感じるにまだ余力がある、現在何を展開するのが良いか尋ねたい' in ja", "ko": "Translation of '有只小小鸡目前只搭梯子,느낌이 아직 여력이 있어 현재 무엇을 배치하는 것이 좋을지 물어보고 싶다' in ko", "de": "Translation of '有只小小鸡目前只搭梯子,ยัง 느끼기에 여전히 여유가 있고 현재 무엇을 배치하는 것이 좋은지 물어보고 싶다' in de", "fr": "Translation of '有只小小鸡目前只搭梯子,아직도 여유가 있다고 느끼며 현재 무엇을 배치하는 것이 좋을지 물어보고 싶다' in fr", "it": "Translation of '有只小小鸡目前只搭梯子,아직도 여유가 있다고 느끼며 현재 무엇을 배치하는 것이 좋을지 물어보고 싶다' in it} |
致命
[准确性]
"entire hypothesis"
理由: Hypothesis provides metadata labels for multiple language translations rather than actual translation content. The format is incorrect. | Hypothesis is placeholder text, not actual translation | All segments are meta strings 'Translation of ... in X' and some even include mixed/incorrect languages; there is no actual translation of the Chinese source despite a clear reference meaning. |
| tencent/HY-MT1.5-7B | "en": "A small chick currently only uses a ladder and feels it still has some extra energy. It wants to know what projects are worth deploying right now.", "zh-TW": "有一隻小雞目前只在使用梯子,感覺還有一些餘力,想知道現在有哪些項目值得部署。,", "ja": "一隻小さな鶏目前正在使用梯子,感覺還有一些餘力,想要知道現在有哪些項目值得部署。,", "ko": "한 마리 작은 병아리가 현재로서는 사다리만을 사용하고 있으며, 아직도 여유가 있다고 느끼고 있습니다. 어떤 프로젝트들이 지금 배포하기에 적합한지 알고 싶어합니다。,", "de": "Ein kleines Huhn verwendet derzeit nur eine Leiter und fühlt sich noch ausreichend energiegeladen. Es möchte wissen, welche Projekte momentan geeignet sind für die Bereitstellung。,", "fr": "Un petit poulet utilise actuellement seulement une échelle et se sent encore suffisamment en forme. Il veut savoir quels projets sont appropriés pour la mise en œuvre pour l’instant。,", "it": "Un piccolo pollo utilizza attualmente solo una scala e si sente ancora abbastanza in forze. Vuole sapere quali progetti sono adatti per l’attuazione in questo momento。,", "nl": "Een klein kuiken gebruikt momenteel alleen een ladder en voelt zich nog steeds voldoende energiek. Het wil weten welke projecten er zijn die nu geschikt zijn voor implementatie。,", "pt": "Um pequeno frango está atualmente usando apenas uma escada e sente que ainda tem energia extra. Gostaria de saber quais projetos são dignos de serem implementados agora。,", "ru": "Одна маленькая курица в настоящее время использует только лестницу и чувствует, что у неё ещё есть запас сил. Она хочет узнать, какие проекты стоит развертывать сейчас。,", "es": "Un pollito pequeño utiliza actualmente solo una escalera y se siente aún con suficiente energía. Quiere saber qué proyectos son adecuados para su implementación en este momento。,", "sv": "Enligt detta inlägg använder ett litet kyckling för närvarande endast en stegplatta och känner sig fortfarande fyllt av energi. Den vill veta vilka projekt som är lämpliga att distribuera just nu。,", "fi": "En liten kyckling använder för närvarande endast en stegplatta och känner sig fortfarande energisk. Den vill veta vilka projekt som är lämpliga att distribuera just nu。,", "dt": "Ein kleines Huhn verwendet derzeit nur eine Leiter und fühlt sich noch ausreichend energiegeladen. Es möchte wissen, welche Projekte momentan geeignet sind für die Durchführung。,", "hu": "Egy küçük háromszárú köttőbennek számítólag csak egy létrő kullan az ülső része, és értese, hogy még enough energiat tartá magában. Tudni szeretné, mely projektek érdemes lenne núint elindítani。,", "cs": "Egy nagyörszárú köttőbennek, amely számítósgép használ, aktuálnally chỉ uses one ladder, and it realizes that it still has some energy left. It would like to know which projects are suitable for deployment at this time。,", "sk": "Tento blogbeitrag uvádzi, že jedna malá kuřatka momentáně používá pouze žebřík a má pocit, že má ještě dostatek sil. Chce vědět, které projekty jsou vhodné ke spuštění právě teď。,", "sl": "Tento blog příspěvek uvádí, že jedno malé kuře momentálně používá pouze žebřík a má pocit, že má ještě dostatek síly. Chce vědět, které projekty jsou vhodné ke | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Description: The server is currently down for maintenance. |
致命
[准确性]
"The server is currently down for maintenance."
理由: Completely unrelated translation. The source asks about VPS connection issues and GFW blocking, not server maintenance. | Doesn't match the reference content about connectivity issues. | The hypothesis is completely unrelated to the source. It should be a translation of the title and content about VPS connection issues and GFW blocking. | 與原文詢問VPS超時、是否被牆及如何自檢的內容完全不相干,屬內容完全錯誤。 | Hallucination. The hypothesis completely ignores the source text and invents a status message. |
| CohereLabs/tiny-aya-water | "en": "Title: Do you know anyone who understands this? I bought VPS from Linode and suddenly can't connect anymore, timeout error reported, is it blocked?", "zh-TW": "標題:有沒有人懂這方面的知識?我在 Linode 購買的 VPS,突然無法連線,出現 timeout 錯誤,是否被封鎖?如何檢查?", "ja": "タイトル:この分野の専門家はいますか?私はLinodeからVPSを購入しましたが、突然接続できなくなりました。タイムアウトエラーが発生しています。これはブロックされているのでしょうか?自己診断方法はありますか?", "ko": "제목: 이 분야의 전문가가 있나요? 저는 Linode에서 VPS를 구매했지만 갑자기 연결이 끊어졌습니다. 타임아웃 오류가 발생했습니다. 이것이 차단되었나요? zelfdiagnose 방법은 무엇인가요?", "de": "Titel: Ist jemand da, der dieses Thema versteht? Ich habe ein VPS von Linode gekauft und kann plötzlich nicht mehr verbinden. Es wird eine Timeout-Fehler报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报报 |
严重
[准确性]
"Linode"
理由: Mistranslation of '搬瓦工'. '搬瓦工' is the specific brand name 'BandwagonHost'. Linode is a different provider. | The source clearly says 搬瓦工 (BandwagonHost), not Linode. | Incorrect vendor - should be 'BandwagonHost' not 'Linode'. Source mentions '搬瓦工' which is BandwagonHost |
| Qwen/Qwen2.5-14B-Instruct | Title: Is there anyone who understands this? I bought a VPS from BandwagonHost, and suddenly can't connect, getting a timeout error. Could it be blocked? How do I check? Content: Is there anyone who understands this? I bought a VPS from BandwagonHost, and suddenly can't connect, getting a timeout error. Could it be blocked? How do I check? |
轻微
[准确性]
"Could it be blocked?"
理由: Source implies being blocked by GFW; omission of "by the firewall/GFW" slightly reduces specificity. | Reference uses 'Was it blocked by the GFW' - more specific about what blocked it | Should be 'Was it blocked by the GFW?' to be more specific |
| Qwen/Qwen2.5-7B-Instruct | Is there anyone who understands this? I can't connect to my VPS from搬瓦工; it's showing a timeout error. Is it firewalled? How can I check? |
严重
[准确性]
"Is it firewalled?"
理由: Slightly different: '被墙' specifically means 'blocked by the Great Firewall (GFW)', not just 'firewalled'. Reference correctly says 'blocked by the GFW' | '被墙了' specifically refers to being blocked by the Great Firewall (GFW); 'firewalled' is too generic and misses the China/GFW nuance. | Mistranslation - should be 'blocked by the GFW' not 'firewalled' |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: The hypothesis is empty/malformed and contains no translation of the source text. | Hypothesis contains only placeholder text '...}' instead of actual translation. No valid translation provided. | Incomplete translation. |
| Qwen/Qwen3.5-4B | Title: Does anyone here understand this? I bought a VPS from Cloudflare via GSO, and suddenly it's not connecting anymore. It reports a timeout error. Is it blocked by GFW? How do I self-check? |
致命
[准确性]
"Cloudflare via GSO"
理由: Should be 'BandwagonHost' not 'Cloudflare via GSO' | Complete hallucination. '搬瓦工' is 'BandwagonHost'. The hypothesis invents 'Cloudflare' and 'GSO'. | “搬瓦工” is BandwagonHost; Cloudflare/GSO are completely incorrect entities |
| Qwen/Qwen3.5-9B | Title: Does anyone know about this? I bought a VPS from BandwagonHost, and suddenly can't connect anymore, getting a timeout error. Is it blocked by GFW? How to self-check? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Have you understood this aspect? I bought a VPS from Waipool, suddenly couldn't connect, timeout error. Is it blocked by firewall? How to check? |
严重
[准确性]
"Waipool"
理由: Mistranslates '搬瓦工' as 'Waipool'. Reference shows it should be 'BandwagonHost' which is the actual company name. | Mis-translation of ‘搬瓦工’, which refers to BandwagonHost; ‘Waipool’ is incorrect. | Mistranslation of '搬瓦工' (BandwagonHost). 'Waipool' is a nonsensical literal translation. |
| google/gemma-3-12b-it | Title: Does anyone here understand this? I bought a VPS from BandwagonHost and suddenly can’t connect to it anymore, getting a timeout error. Is it blocked by the GFW? How do I self-check? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '有没有懂这个方面的佬?' in en |
致命
[准确性]
"Translation of '有没有懂这个方面的佬?' in en"
理由: The hypothesis contains only meta-text describing the translation of the first sentence. The rest of the source text (VPS, BandwagonHost, timeout, GFW, self-check) is completely missing. | No meaningful translation provided. | The hypothesis is not a proper translation; it indicates it is a translation to English but the target language is Chinese (zh). Also, the content is missing entirely. | Hypothesis does not provide actual translation, only states 'Translation of...' without delivering the translation. | No English translation phrase is provided; it's only a description of the translation task. |
| google/gemma-3-4b-it | Title: Does anyone understand this area? I bought a VPS from a scraper, and suddenly I can’t connect anymore, with a timeout error. Is it being blocked? How to self-check? |
致命
[准确性]
"from a scraper"
理由: Mistranslation of '搬瓦工' (BandwagonHost, a specific VPS provider). 'Scraper' is completely wrong. | 严重误译,“搬瓦工”是 VPS 提供商 BandwagonHost,不是 ‘scraper’。 | Hypothesis translates '搬瓦工' as 'scraper' but reference correctly identifies it as 'BandwagonHost', which is a specific VPS provider name |
| google/translategemma-12b-it | Title: Does anyone understand this? I bought a VPS from Vultr, and suddenly I can't connect to it anymore. It's showing a timeout error. Is it blocked by the GFW? How can I check? |
致命
[准确性]
"Vultr"
理由: Should be 'BandwagonHost' not 'Vultr'. The source explicitly mentions '搬瓦工' (BandwagonHost). | Source explicitly says 搬瓦工 (BandwagonHost); Vultr is a different provider, changing a factual detail. | Hallucination/Error: '搬瓦工' is BandwagonHost, not Vultr. This is a factual error changing the service provider. |
| google/translategemma-4b-it | Title: Does anyone know about this? I bought a VPS from a tiling contractor, and suddenly I can't connect anymore, with a timeout error. Is it being blocked? |
致命
[准确性]
"tiling contractor"
理由: Should be 'from BandwagonHost' not 'from a tiling contractor'. '搬瓦工' is the name of a VPS provider, not a tiling contractor. This is a severe mistranslation. | Misinterprets “搬瓦工” (BandwagonHost VPS provider) as a construction-related term; changes subject entirely | Mistranslation of '搬瓦工' (BandwagonHost), a specific VPS provider. Translated literally as a construction worker. |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
严重
[准确性]
"Translation of '登录' in en"
理由: The hypothesis translates the word 'Login' which is not in the source, and completely fails to translate the actual source text regarding VPS connection issues. | Hypothesis is placeholder text, not actual translation; completely unrelated to source content about VPS timeout and GFW | The translation does not match the source content. |
| tencent/HY-MT1.5-7B | Does anyone understand this? I bought a VPS from a vendor, but suddenly I can’t connect anymore. It keeps showing a timeout error. Could it be that my account has been blocked by some security measure? How can I check this myself? |
严重
[准确性]
"from a vendor"
理由: Omission of specific entity '搬瓦工' (BandwagonHost), replaced with generic 'vendor'. | Source specifies BandwagonHost (搬瓦工) which is an important detail; replacing with generic “from a vendor” omits this. | The specific vendor 'BandwagonHost' is omitted.
严重
[准确性]
"my account has been blocked by some security measure"
理由: Misinterprets '被墙了' (blocked by GFW/Great Firewall). Hypothesis suggests account blocking, but reference clarifies it means network blocking by the GFW. | Source asks whether it was blocked by the GFW (Great Firewall), not a generic security block. | Mistranslation of '被墙' (blocked by the Great Firewall/GFW) as a generic security measure. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: cocopilot喂饭级教程,搭配oaipro,再也不加班! | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: cocopilot喂饭级教程,搭配oaipro,再也不加班! Content: 1 Clone project git clone ... 2 Deployment methods cocoliot supports docker deployment, local deployment, and can be used on both linux and windows. 2.1 Method one Docker deployment (recommended) | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: cocopilot Feeding Level Tutorial, Pairing with oai-pro, No More Overtime! Content: 1 Clone project git clone ... 2 Deployment Methods cocopilot supports Docker deployment and local deployment, works on both Linux and Windows. 2.1 Method One Docker Deployment (Recommended) |
严重
[准确性]
"Feeding Level Tutorial"
理由: Should be 'Spoon-feeding (step-by-step) tutorial' not 'Feeding Level Tutorial' | Reference uses 'Spoon-feeding (step-by-step) tutorial' - '喂饭级' is slang for step-by-step | Mistranslation of '喂饭级'. It means 'spoon-feeding' or 'step-by-step' (very easy to follow). 'Feeding Level' is nonsensical. |
| Qwen/Qwen2.5-7B-Instruct | Title: cocopilot Feeding Level Tutorial, Paired with oaipro, Never Work Overtime Anymore! Content: 1 Clone Project git clone ... 2 Deployment Method cocoliot supports Docker deployment and local deployment; both Linux and Windows can be used. 2.1 Method One Docker deployment (Recommended) | 通过 (无共识错误) |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: The hypothesis contains malformed text and fails to provide any translation of the source content. | Hypothesis contains only placeholder text '...}' instead of actual translation. No valid translation provided. | Incomplete translation. |
| Qwen/Qwen3.5-4B | Title: Cocopilot Feeding-Level Tutorial, Paired with OAIPro, No More Overtime! Content: 1 Clone Project git clone... 2 Deployment Method Cocopilot supports Docker deployment and local deployment, works on Linux and Windows. 2.1 Method One Docker Deployment (Recommended) |
严重
[准确性]
"cocoliot"
理由: Typo in the hypothesis ('Cocoliot' vs 'Cocopilot'/'CoCopilot'). | Mistranslation: 'Cocopilot supports' should be 'CoCopilot supports' as in reference - wrong project name | Typo/mistranscription of “cocopilot” / “CoCopilot” |
| Qwen/Qwen3.5-9B | Title: Cocopilot beginner-friendly tutorial, paired with oaipro, no more overtime! Content: 1 Clone project git clone... 2 Deployment methods Cocopilot supports Docker deployment and local deployment, usable on both Linux and Windows. 2.1 Method one: Docker deployment (recommended) | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Cocopilot Feeding Level Tutorial, combining oaipro, and no more overtime! | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: cocopilot Beginner Tutorial, paired with oai, no more overtime! | 通过 (无共识错误) |
| google/gemma-3-1b-it | Cocopilot Feeding Guide Tutorial, paired with oaipro, no more overtime! |
轻微
[术语]
"Cocopilot Feeding Guide Tutorial"
理由: '喂饭级' should be translated as 'spoon-feeding (step-by-step)' not just 'Feeding Guide'. Reference provides more accurate terminology. | Mistranslation of '喂饭级教程' (spoon-feeding/step-by-step tutorial) as 'Feeding Guide Tutorial', which sounds like instructions for feeding a pet or person, not a technical tutorial style. | "喂饭级教程" means spoon-feeding/hand-holding tutorial; "Feeding Guide" is understandable but somewhat off and unnatural as a tutorial descriptor. |
| google/gemma-3-4b-it | Title: cocopilot meal tutorial, compatible with oaip, never overtime! | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: cocopilot Beginner Tutorial, paired with oai, no more overtime! | 通过 (无共识错误) |
| google/translategemma-4b-it | cocopilot feeding level tutorial, compatible with oaipro, no more overtime! | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "title": "cocopilot喂饭级教程,搭配oaipro,再也不加班!", "content": "1 克隆项目 git clone ... 2 部署方式 cocoliot支持docker部署、本地部署,适用于Linux和Windows系统。 2.1 方式一:docker部署(推荐)" } | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Cocopilot’s Easy-to-Use Tutorial – Combined with OAIPRO, No More Overtime! | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: The True Nature of the Free API Site |
严重
[准确性]
"The True Nature of the Free API Site"
理由: Should be 'Free API proxy sites are truly great philanthropists' not 'The True Nature of the Free API Site'. The meaning is significantly different. | Incorrect translation: The title '公益站真的是大慈善家' means 'Free API sites are truly great philanthropists', not 'The True Nature of the Free API Site'. The translation loses the figurative meaning. | 標題改寫為抽象概述,丟失“公益站真的是大慈善家”中對公益站的稱讚與“慈善家”隱喻,語氣與含義均有偏差,且正文完全缺失。 | Mistranslation of '大慈善家' (great philanthropists) as 'True Nature', losing the specific praise intended. | Should be 'Free API proxy sites are truly great philanthropists' for better clarity. |
| CohereLabs/tiny-aya-water | Title: The True Nature of the Free Shared API Site |
严重
[准确性]
"The True Nature of the Free Shared API Site"
理由: Incorrect translation - should be 'Free API proxy sites are truly great philanthropists' not 'The True Nature of the Free Shared API Site' | The title is completely rewritten and loses the meaning of '大慈善家' (great philanthropist) and the sarcastic/grateful tone. Also, Content is missing entirely. | Title is paraphrased and loses evaluative emphasis that "公益站真的是大慈善家" conveys (literally: free API sites are real philanthropists). The nuance of praise/philanthropy is not preserved. |
| Qwen/Qwen2.5-14B-Instruct | Title: The Free Shared API Site is Truly a Great Philanthropist Content: My cursor, kiro, and other AIs have all expired one after another. Now I'm using the free shared API site here, which has truly saved my life. I thought I'd have to spend money on Xiao Huang Yu again, but unexpectedly, there's light at the end of the tunnel. Thank you, the great philanthropists of the free shared API site. |
严重
[准确性]
"Xiao Huang Yu"
理由: Failure to translate/localize '小黄鱼'. It is a nickname for 'Xianyu' (Idle Fish), a second-hand marketplace. Translating it as a proper name without explanation or using the English name 'Xianyu' (or 'second-hand market') is better. | Common platform is "闲鱼 (Xianyu)"; "小黄鱼" is inaccurate transcription and may confuse. | Should be 'Xianyu (second-hand market)' not 'Xiao Huang Yu' - incorrect transliteration | Reference uses 'spend money on Xianyu (second-hand market)' - 'Xiao Huang Yu' is incorrect transliteration |
| Qwen/Qwen2.5-7B-Instruct | Title: The Public Station Is Truly a Lifesaver Content: My cursor and kiro AI have all expired one after another, now relying on the public station here, it's truly a lifesaver. I thought I would have to go back to Xiao Huangyu again, but unexpectedly, there's a turn for the better. Thank you so much to the Public Station gurus, we can finally enjoy good services. |
严重
[术语]
"Public Station"
理由: Mistranslation of '公益站'. In this context, it refers to 'Free API proxy sites' or 'Community free stations', not a generic 'Public Station'. | Imprecise - should be 'Free API proxy sites' not 'Public Station' | Incorrect: '公益站' means 'free API proxy site', not 'Public Station'. Reference correctly uses 'Free API proxy sites' |
| Qwen/Qwen3-14B | Title: The Free Shared API Site Is Truly a Great Philanthropist Content: My cursor, kiro and other AIs have expired one after another. Now using the site's free shared API site has really saved my life. I thought I would have to consume on Xiaohuangyu again, but didn't expect to see another village after the willows and flowers, thank you to the great philanthropist of the public site, now I can enjoy the good ones. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: The free shared API site is truly a savior Content: My cursor and kiro AIs have expired one after another. Now using the public proxy on this site has saved me. I thought I'd have to spend money on Xianyu again, but instead found a better solution. Thanks to the community leaders here, I'm finally enjoying good things. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: The public proxy is truly a great benefactor Content: My cursor and kiro AIs have expired one after another, and now I'm using the free shared API on this site, which has saved my life. I thought I'd have to spend money on XHuangyu again, but unexpectedly found a new solution. Thank you to the experts at the public proxy; finally enjoying premium services. |
严重
[术语]
"public proxy"
理由: “公益站” refers to public-welfare/free API proxy sites; “public proxy” is too generic and misses the charitable/free nuance present in both source and reference. | '公益站' means 'free API proxy sites', not 'public proxy'. Should be 'Free API proxy sites' | '公益站' is 'charity/free public station'. 'Public proxy' is okay, but 'Free API proxy sites' is more descriptive. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Free Shared API Site is Truly Generous | 通过 (无共识错误) |
| google/gemma-3-12b-it | Free API sites really are lifesavers. | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "The '公益站' is really a big philanthropist.", "zh-TW": "小弟的Cursor、kiro 等ai 相继倒下,現在用著站裡嘅公益站,真係救大命了,本以為又得去小黃魚消費了,唔係想再買貴東西,原來柳暗花明又一村,謝謝公益站大佬,吃咗好嘅了。", "ja": "小弟のクローカー、キロなど、以前は相次いで倒壊し、今では公益サイトを利用して救命だ。本気でまた小黄魚を買い占めようと思っていたが、実は柳暗の花明が再び現れ、感謝する公益サイトのオーナーに食べさせられた。", "ko": "소녀의 커터, 키로 등과 같이 이전에는 잦은 파산이 있었지만, 지금은 공익 사이트를 이용해 생명을 구했다. 원래는 또 다른 물건을 사려고 했지만, 결국 유미가 꽃이 다시 피었다고 해서 감사하는 공익 사이트 주인에게 먹여졌다.”, "de": "Der 'Freie API'-Dienst ist wirklich ein großer Philanthrop." , "fr": "Le 'API gratuit' est vraiment un grand philanthrope.", "it": "L' 'API gratuito' è davvero un grande benefattore." } |
致命
[准确性]
""en": "The '公益站' is really a big philanthropist.""
理由: The hypothesis provides a JSON object with multiple languages instead of a single Chinese-to-English translation. The English segment leaves '公益站' untranslated. The other language segments contain severe hallucinations and errors (e.g., Korean '소녀' meaning 'girl' instead of 'I/junior', Japanese '食べさせられた' meaning 'was forced to eat'). The format is completely wrong for the task. | Hypothesis provides only partial translation and incorrect format. Should translate full content about expired AI accounts and free API services. '公益站' means 'free API proxy sites' not just '公益站'. | The hypothesis provides translations in multiple languages instead of Chinese (zh). The content translation is completely missing. |
| google/gemma-3-4b-it | Title: Public API sites are really big philanthropists Content: My cursor, kiro, and other AI tools have successively expired and fallen down. Now I am using public API sites, which has saved my life. I thought I had to go to Xiaohongyu again to consume, but unexpectedly, there was a bright spot after the darkness, thank you to the public API site masters, and I can eat well. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Is the public API really that generous? Content: My cursors, kiro, and other AIs have successively expired and stopped working. Now I'm using the public API provided by this site, which is truly a lifesaver. I thought I would have to spend money on Xiaohuangyu again, but unexpectedly, things turned out better than expected. Thank you to the developers of the public API; now I can use high-quality services. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Public API sites are truly lifesavers |
轻微
[风格]
"are truly lifesavers"
理由: Source says 'great philanthropists' (大慈善家), hypothesis changes metaphor to 'lifesavers'. Acceptable but shifts tone. | Should be 'philanthropists' | Source metaphor “大慈善家” emphasizes generosity; “lifesavers” focuses on utility rather than philanthropy nuance, though sentiment remains positive |
| tencent/HY-MT1.5-1.8B | Translation of '公益站真的是大慈善家' in en |
严重
[准确性]
"Translation of '公益站真的是大慈善家' in en"
理由: Hypothesis is placeholder text, not actual translation | The hypothesis is a meta-string describing the task. It fails to provide the actual translation of the title or the content. | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | Title: The Public Welfare Site Is Truly a Great Philanthropist | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Summary】 Seeing various abbreviations like cpa, ccr, and cc while posting, for those unfamiliar with related projects, it's completely unclear what to say. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: 【Summary】 Seeing various abbreviations like cpa, ccr, and cc while posting, for those unfamiliar with related projects, it's completely unclear what to say. | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [Summary] Seeing various abbreviations while browsing posts, such as CPA, CCR, CCS; for those unfamiliar with the projects, they have no idea what is being said Content: Therefore, I am collecting and summarizing some real meanings behind these abbreviations. Here are a few that I know: CC: claudecode ACE: augment context engine CCR: GitHub - musistudio/claude-code-router | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Summary] Seeing various abbreviations while posting threads, such as cpa, ccr, ccs, many people who are unfamiliar with related projects have no idea what they mean Content: So we're collecting and summarizing the true meanings of some abbreviations. Here are some I know: cc: Claude Code Ace: Augmented Context Engine CCR: GitHub - musistudio/claude-code-router | 通过 (无共识错误) |
| Qwen/Qwen3-14B | 【Riassunto】Nel navigare tra i post ho visto varie abbreviazioni come CPA, CCR, CCS che per chi non è familiare con i progetti correlati sono completamente sconosciute |
致命
[准确性]
"【Riassunto】Nel navigare tra i post ho visto varie abbreviazioni come CPA, CCR, CCS che per chi non è familiare con i progetti correlati sono completamente sconosciute"
理由: The hypothesis translates only the title and omits the entire content paragraph listing specific abbreviations and their meanings. | Translation is in Italian instead of target language Chinese (zh). Complete language mismatch. Content section is also missing. | Riproduce solo la parte del titolo; omette la richiesta di raccogliere e riassumere le abbreviazioni e gli esempi forniti (CC, ACE, CCR con link GitHub). |
| Qwen/Qwen3.5-4B | Title: [Summary] Encountering various abbreviations while scrolling through posts, such as cpa, ccr, ccs. For those unfamiliar with related projects, it's completely unclear what they mean. Content: Therefore, I'm collecting and summarizing the true meanings of some abbreviations. Here are some I know: cc: claudecode ace: augment context engine ccr: GitHub - musistudio/claude-code-router | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Summary] Encountering various abbreviations while browsing posts, such as cpa, ccr, ccs. For those unfamiliar with related projects, it's completely unclear what is being discussed. Content: So I'm collecting and summarizing the real meanings of some abbreviations. Here are some I know: cc: claudecode ace: augment context engine ccr: GitHub - musistudio/claude-code-router | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Summary of posts seeing various abbreviations, such as CPA, CCR, CCS, for those unfamiliar with related projects, they mean nothing. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [Summary] I saw various abbreviations when browsing posts, such as CPA, CCR, CCS, which is completely unknown to people who don’t understand the relevant projects. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Summary: Understanding common abbreviations like cpa, crc, ccs for those unfamiliar with the project. |
严重
[准确性]
"crc"
理由: The hypothesis summarizes the intent rather than translating the text. It misses the specific examples provided in the source (cc, ace, ccr definitions). It also changes 'ccr' to 'crc'. | The hypothesis is in English instead of Chinese (target language is zh). It also mistranslates 'ccr' as 'crc' and is missing the content translation. | Incomplete translation. Missing 'CCS' in source becomes 'crc' in hypothesis. Does not translate the content section with abbreviation definitions. | The source abbreviation is "ccr"; hypothesis changes it to "crc". |
| google/gemma-3-4b-it | Title: [Summary] Seeing various abbreviations when brushing posts, such as CPA, CCR, CCS, for people who don’t understand related projects, they have no idea what it means. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Summary] I came across various abbreviations while browsing posts, such as CPA, CCR, and CCS. For those unfamiliar with the relevant projects, it's completely unclear what they mean. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Summary] Seeing various abbreviations when browsing posts, such as cpa, ccr, ccs, which completely confuses people who don't know the related projects. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'cc' with LinuxDo community context |
严重
[准确性]
"Translation of 'cc' with LinuxDo community context"
理由: Hypothesis is placeholder text, not actual translation; source is about abbreviations summary post, not just 'cc' | The translation does not provide the actual content but only mentions the translation. | The hypothesis attempts to define a single abbreviation 'cc' instead of translating the full source text which discusses multiple abbreviations (CPA, CCR, CCS) and asks for a summary. |
| tencent/HY-MT1.5-7B | "en": "Title: [Summary] Seeing various abbreviations when browsing posts, such as CPA, CCR, CCS. For those who don’t understand these terms, they have no idea what they mean.", "zh-TW": "标题:[摘要] 浏览帖子时会看到各种缩写,例如CPA、CCR、CCS。对于不了解这些术语的人来说,他们完全不知道这些代表什么意义。", "ja": "タイトル:[要約] ポストを閲覧すると様々な略語が見られます。例えばCPA、CCR、CCSなどです。これらの用語を知らない人にとっては、何を意味しているのか全くわかりません。", "ko": "제목: [요약] 게시물을 읽을 때 CPA, CCR, CCS와 같은 다양한 약어를 접하게 됩니다. 이러한 용어에 대해 잘 모르는 사람들에게는 그 의미가 전혀 명확하지 않습니다.", "de":="Titel: [Zusammenfassung] Beim Durchsehen von Beiträgen stößt man auf verschiedene Abkürzungen wie CPA, CCR, CCS. Für diejenigen, die diese Begriffe nicht kennen, ist es völlig unklar, was sie bedeuten.", "fr":="Titre: [Résumé] En lisant les posts, on rencontre diverses abréviations telles que CPA, CCR, CCS. Pour ceux qui ne connaissent pas ces termes, il est impossible de comprendre leur signification.", "it":="Titolo: [Riassunto] Leggendo i post si incontrano diverse abbreviazioni come CPA, CCR, CCS. Per coloro che non conoscono questi termini, è del tutto impossibile capirne il significato." | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is this the secretly updated translation model? The button for switching languages has appeared at the top right corner. |
严重
[准确性]
"the secretly updated translation model"
理由: 忽略“始皇”作為站長暱稱,直譯為“translation model”前的“the”容易弱化“偷偷熱更”的戲謔指責對象,不如明確提及管理員;但整體意思基本保留。 | Mistranslation of '始皇' (a specific nickname for the admin 'Neo' or 'Qin Shi Huang' metaphor) as 'this'. | Incorrect translation: '始皇' is a nickname for the admin 'Neo', not a literal 'translation model'. '热更' means 'hotfix', not 'updated'. | Should be 'Did Neo (the admin) secretly hotfix the translation model?' The source refers to an admin named '始皇' (Neo), not just a generic update. |
| CohereLabs/tiny-aya-water | Title: Is this the secretly updated translation model? The button for switching language has appeared suddenly at the top right corner. |
严重
[准确性]
"Is this the secretly updated translation model?"
理由: Misses the specific reference '始皇' (Qin Shi Huang), which is a nickname for the site admin 'Neo'. Translating as 'this' loses the specific subject. Also omits Content translation. | Slightly off; source rhetorically asks if the admin secretly hot-updated the translation model (热更) rather than if this is that model. Nuance is minor but present. | Missing context - should reference 'Neo (the admin)' or 'the admin' not just 'this'. Also '始皇' is a nickname for the admin |
| Qwen/Qwen2.5-14B-Instruct | Title: Is this a secret hotfix for the translation model by the Admin? A switch language button suddenly appeared in the top right corner Content: There's an extra translation button in the top right corner now, does that mean the model has been selected? |
轻微
[准确性]
"by the Admin"
理由: Reference uses 'Did Neo (the admin) secretly hotfix' - 'Neo' is the admin's username | "始皇" is a nickname (First Emperor) for the site owner Neo; "the Admin" is functional but loses the specific nickname flavor. | Should be 'Did Neo (the admin) secretly hotfix the translation model?' - misses the proper name reference | Mistranslation of '始皇'. In the LINUX DO community, '始皇' (First Emperor) is a specific nickname for the founder/admin 'Neo'. Translating it as generic 'Admin' loses the specific cultural reference. |
| Qwen/Qwen2.5-7B-Instruct | Title: Neo is secretly updating the translation model? An option to switch languages suddenly appeared in the top right corner Content: With an option to translate appearing suddenly in the top right corner, has the model already been selected? |
严重
[准确性]
"Neo is secretly updating the translation model?"
理由: '始皇' is a nickname for the admin; translating it as 'Neo' is unsupported by the source or reference. | Mistranslation of '始皇'. In this specific community context, '始皇' is a nickname for the admin/founder, not 'Neo'. If 'Neo' is not the established English name, this is a hallucination or error. | Mistranslation - '始皇' should be translated as 'the admin' or with context, not 'Neo' |
| Qwen/Qwen3-14B | Title: Is the Admin secretly updating the translation model? A switch language button suddenly appeared in the top right corner Content: Seeing a sudden translation button appear in the top right corner, has the model already been selected? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Did Neo secretly hotfix the translation model? A language switch button suddenly appeared in the top right corner Content: I noticed a new translation button in the top right corner. Has a model been selected already? |
轻微
[准确性]
"Neo"
理由: Should clarify '始皇' refers to admin/Neo in parenthetical | '始皇' is a specific nickname for the admin. Translating as 'Neo' is a hallucination unless 'Neo' is the known English alias (unlikely without context). 'The admin' or transliteration is safer. | “始皇” is a nickname for the forum admin (First Emperor/Qin Shi Huang), not “Neo”; using “Neo” invents a different persona |
| Qwen/Qwen3.5-9B | Title: Did Qin Shi Huang secretly update the translation model? Suddenly, a language switch button appeared in the top right corner Content: Seeing that a translation button suddenly appeared in the top right corner, has the model already been selected? |
严重
[准确性]
"Qin Shi Huang"
理由: '始皇' in this specific community context (LinuxDo) refers to the administrator 'Neo' (or a specific nickname for the admin), NOT the historical emperor Qin Shi Huang. This is a severe hallucination/mistranslation of community slang. | “始皇” here is a nickname for the forum admin (Neo), not the historical emperor Qin Shi Huang; literal translation misidentifies the referent. | '始皇' is a nickname for the admin (Neo), not 'Qin Shi Huang'. Should be 'Neo' or 'the admin' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Is the translation model being updated secretly? Suddenly, there's a language switch button in the top right corner. |
严重
[准确性]
"Is the translation model being updated secretly?"
理由: Same issue as above: misses the specific reference to the admin ‘始皇’, translating generically. | Misses the specific reference '始皇' (Neo/The Admin). The hypothesis generalizes it to a passive voice question, losing the subject. | Mistranslates '始皇' as generic admin action. Reference shows '始皇' refers to 'Neo (the admin)' - a specific person/role name. |
| google/gemma-3-12b-it | Title: Did Neo secretly update the translation model? A language switch button suddenly appeared in the upper right corner Content: Seeing a new translation button appear in the upper right corner, has the model already been selected? | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "Looking at the upper right corner, a new translation button appeared suddenly, has the model already been selected?", "zh-TW": "看着右上角的按钮突然出现了一个翻译按钮,是否已经选择了模型了?", "ja": "画面右上の角に突然翻訳ボタンが現れた、モデルはすでに選択したのだろうか?", "ko": "상단 오른쪽 모서리에 갑자기 번역 버튼이 나타났는데, 모델은 이미 선택했는지 확인되었습니까? (The Korean version is more verbose and might be better suited for a user interface)", "de": "Auf dem oberen rechten Bildschirmrand erschien plötzlich eine neue Übersetzungstaste, wurde das Modell bereits ausgewählt?“, "fr": "Sur l'écran supérieur droit, une nouvelle touche de traduction est apparue soudainement, le modèle a-t-il déjà été sélectionné ?", "it": "Su angolo superiore destro, è apparso improvvisamente un nuovo pulsante di traduzione, il modello è stato già selezionato?" } |
致命
[准确性]
"has the model already been selected?"
理由: Source nuance is "has the model already been finalized/chosen"; "selected" is close but slightly weaker, though acceptable. | The hypothesis provides translations in multiple languages instead of Chinese (zh). The content translation is completely missing. | Hypothesis provides multiple language translations instead of focusing on the requested target language. Missing context about '始皇' (admin name) and 'hotfix'. Incomplete and improperly formatted response. | The hypothesis provides a JSON object with multiple languages and meta-commentary instead of a direct translation. It fails to translate '始皇' (Neo/Admin nickname) in the title. The format is incorrect. | Mistranslation and addition of irrelevant details. |
| google/gemma-3-4b-it | Title: Is Neo secretly hot-updating the translation model? Suddenly, there’s a new language switching button in the top right corner. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Did Neo secretly update the translation model? A language switching button suddenly appeared in the top right corner. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: 始皇 Is this secretly updating the translation model? Suddenly, there's a language switching button in the top right corner. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '始皇这是偷偷热更了翻译模型?突然右上角多了切换语言的按钮' in en |
严重
[准确性]
"Translation of '始皇这是偷偷热更了翻译模型?突然右上角多了切换语言的按钮' in en"
理由: Hypothesis is placeholder text, not actual translation | The hypothesis is a meta-string describing the task. It fails to provide the actual translation of the title or the content. | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | "en": "Title: Has the translation model been secretly updated? Suddenly, there’s a button to switch languages in the top right corner.", "zh-TW": "标题:翻译模型被秘密更新了吗?突然间,右上角出现了切换语言的按钮!", "ja": "タイトル:翻訳モデルがこっそりアップデートされたのか?突然、右上隅に言語切り替えボタンが出現した!", "ko": "제목: 번역 모델이 몰래 업데이트되었나요? 갑자기 오른쪽 상단에 언어를 전환하는 버튼이 나타났습니다!", "de":="Titel: Wurde das Übersetzungsmodell heimlich aktualisiert? Plötzlich erscheint in der oberen rechten Ecke ein Button zum Wechseln der Sprache.", "fr":="Titre: Le modèle de traduction a-t-il été mis à jour en secret ? Soudainement, un bouton pour changer de langue apparaît dans le coin supérieur droit.", "it":="Titolo: Il modello di traduzione è stato aggiornato di nascosto? All’improvviso, nella parte superiore destra appare un pulsante per cambiare lingua."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is there any cheap banana pro for the big banana transfer channel |
严重
[准确性]
"banana pro for the big banana transfer channel"
理由: Literal and confusing translation of '大香蕉中转渠道' (Banana Pro relay/proxy channel). 'Big banana' is redundant and awkward. | “banana pro 大香蕉中转渠道”指Banana Pro相關中轉/代理渠道,譯文本地化失敗,保留“banana”且結構混亂,讀者難以理解是API/中轉服務。 | Should be 'Are there any cheap API proxy channels for Banana Pro?' The hypothesis is confusing and doesn't accurately convey the meaning. | Incorrect translation: '中转渠道' means 'proxy channel/API proxy', not 'transfer channel'. The title loses the technical meaning. |
| CohereLabs/tiny-aya-water | Title: Is there any cheap banana pro for the middleman of Da Xiang Chao |
致命
[准确性]
"Is there any cheap banana pro for the middleman of Da Xiang Chao"
理由: Completely garbled translation - should be 'Are there any cheap API proxy channels for Banana Pro?' The hypothesis is incomprehensible | Complete mistranslation. '大香蕉中转渠道' means 'Big Banana (Banana Pro) relay/proxy channel'. 'Da Xiang Chao' is a nonsensical transliteration of a misinterpreted phrase. '中转' means relay/proxy, not 'middleman of...'. | Misinterprets the whole phrase. "香蕉pro" is a specific service/product (Banana Pro) and "大香蕉中转渠道" are proxy/relay channels for it. The hypothesis is not meaningful in English and does not convey the idea of cheap proxy channels. |
| Qwen/Qwen2.5-14B-Instruct | Title: Is there any cheap Banana Pro Big Banana transit channel? Content: The cheapest one I found is 0.23 yuan per image. As my daily raw image generation volume is around 200 to 500 images, I'm wondering if there's any way to further reduce costs. |
严重
[准确性]
"Big Banana transit channel"
理由: Mistranslation of '大香蕉中转'. 'Banana Pro' is the model/service. 'Big Banana' is likely a humorous repetition or part of the name, but 'transit channel' for '中转渠道' (proxy/relay channel) is okay. However, 'Banana Pro Big Banana' sounds redundant and confused. The reference simplifies to 'API proxy channels for Banana Pro'. | Reference uses 'API proxy channels for Banana Pro' - 'Big Banana' is incorrect, 'transit channel' is wrong term | Should be 'API proxy channels' not 'Big Banana transit channel' - misses the meaning |
| Qwen/Qwen2.5-7B-Instruct | Title: Is There an Affordable Banana Pro Transfer Channel? Content: The cheapest I found is 23 cents per image. With a large amount of raw images, I generate around 200 to 500 images per day and wonder if there's a way to save on costs. |
严重
[准确性]
"23 cents per image"
理由: Source price is '2毛3一张图' (0.23 RMB); target omits the currency unit 'RMB', which can mislead. | Mistranslation - should be '0.23 RMB' not '23 cents' | Incorrect: '2毛3' means '0.23 RMB' (Chinese yuan), not '23 cents' (USD). Reference correctly uses '0.23 RMB' | Currency error. '2 毛 3' is 0.23 RMB, not 23 cents (USD).
严重
[术语]
"Banana Pro Transfer Channel"
理由: Incorrect: 'Banana Pro' likely refers to an API service, '中转渠道' means 'proxy channel/API proxy', not 'Transfer Channel'. Reference correctly says 'API proxy channels for Banana Pro' | '中转渠道' in this context refers to an 'API proxy channel' or 'relay', not a generic 'transfer channel'. | Mistranslation - should be 'Banana Pro API proxy channels' not 'Transfer Channel' |
| Qwen/Qwen3-14B | Gibt es einen günstigen Banana Pro Da Xiangjiao-Relay-Kanal? |
致命
[准确性]
"Gibt es einen günstigen Banana Pro Da Xiangjiao-Relay-Kanal?"
理由: The hypothesis translates only the title and omits the entire content paragraph. It also unnecessarily transliterates '大香蕉' (Big Banana) as 'Da Xiangjiao' instead of translating it. | Non menziona il contesto di costo per immagine, volume di generazione giornaliero (200–500) e l’obiettivo di ridurre ulteriormente i costi. | Translation is in German instead of target language Chinese (zh). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Are there any affordable channels for generating large images? Content: The cheapest option I found is 0.23 per image. Given my high volume of generation (around 200-500 images daily), I was wondering if we could further reduce costs. |
严重
[准确性]
"channels for generating large images"
理由: Should be 'Banana Pro API proxy channels' not 'generating large images' | Missed the specific proper noun 'banana pro' (an AI image generation API service), translating it descriptively as 'large images'. | Source asks for cheap Banana Pro proxy/API channels; “generating large images” omits Banana Pro context and proxy aspect
轻微
[准确性]
"0.23 per image"
理由: Currency (RMB) omitted; might cause ambiguity | Should specify '0.23 RMB per image' | Missing currency unit (RMB/Yuan) which is implied in the source. |
| Qwen/Qwen3.5-9B | Title: Are there any cheap banana pro large banana relay channels? Content: The cheapest I found is 0.23 per image. I generate a lot of images personally, about 200-500 per day, wondering if costs can be further reduced. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Is there an affordable banana pro channel? "Big Banana" intermediary |
严重
[准确性]
"banana pro channel? "Big Banana" intermediary"
理由: “banana pro 大香蕉中转渠道” refers to cheap proxy/API relay channels for Banana Pro; the translation does not convey ‘proxy/API relay’ and leaves ‘Big Banana’ unclear. | '中转渠道' means 'proxy channel' or 'API proxy channel' not 'intermediary'. | Mistranslates '大香蕉' as 'Big Banana intermediary'. Reference shows it should be 'API proxy channels for Banana Pro' - the slang term refers to the service, not a literal banana. |
| google/gemma-3-12b-it | Title: Are there any cheap channels for banana pro large bananas transshipment? Content: The cheapest I found is 0.23 yuan per picture. I personally generate a large volume of images, about 200-500 per day, and I was wondering if I could save some costs. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The cheapest banana import channel is around 2 yuan and 3 RMB. |
严重
[准确性]
"banana import channel"
理由: The hypothesis is in English instead of Chinese (target language is zh). It also mistranslates 'banana pro' (an API proxy service) as a fruit, '中转渠道' (proxy channel) as 'import channel', and '2毛3' (0.23 RMB) as '2 yuan and 3 RMB'. | "大香蕉中转渠道" refers to Banana Pro proxy channel, not literal banana import. | Mistranslation of cost. | '大香蕉' (Banana Pro) is an API service, not literal bananas. '2毛3' means '0.23 RMB' not '2 yuan and 3 RMB'. Misses context about image generation pricing. | Severe mistranslation of price: '2 毛 3' (0.23 RMB) is translated as '2 yuan and 3 RMB'. 'banana pro' context (image generation API) is lost, translated as 'banana import channel'. The rest of the content about volume and cost saving is missing. |
| google/gemma-3-4b-it | Title: Are there cheap banana pro transfer channels? Content: The cheapest I found is 2.3 yuan per picture. I personally do a lot of shooting, generating about 200-500 pictures a day, thinking that I could also save some costs. |
致命
[准确性]
"2.3 yuan per picture"
理由: Mistranslation of '2 毛 3' (0.23 yuan). '2 毛 3' is 2 mao 3 fen, not 2.3 yuan. | Order of magnitude error; should be 0.23 RMB per image, not 2.3 yuan. | The term '2.3 yuan' is incorrect; the correct term is '0.23 RMB'. |
| google/translategemma-12b-it | Title: Are there any cheap channels for reselling "banana pro" (large bananas)? Content: The cheapest option I found is 0.23 yuan per image. I personally generate a large volume of images, about 200-500 per day, and I was wondering if there's a way to further reduce costs. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Are there any cheap banana pro transfer channels? |
严重
[术语]
"banana pro transfer channels"
理由: '中转' in this context means 'proxy' or 'relay' for API, not 'transfer'. 'Banana Pro' is a specific AI model service. | Misses “中转渠道” nuance as API proxy/relay; should be “proxy/relay channels for Banana Pro API” or similar to match reference tech meaning | Should be 'API proxy channels' | Should be 'API proxy channels for Banana Pro' not 'banana pro transfer channels'. The Chinese '中转渠道' means proxy/relay channels, and 'Banana Pro' is a service name. |
| tencent/HY-MT1.5-1.8B | Translation of '有没有便宜的 banana pro 大香蕉中转渠道' in en |
严重
[准确性]
"Translation of '有没有便宜的 banana pro 大香蕉中转渠道' in en"
理由: Hypothesis is placeholder text, not actual translation | The hypothesis is a meta-string describing the task. It fails to provide the actual translation of the title or the content. | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | "en": "Title: Are there any cheap channels to purchase Banana Pro? Looking for cheaper alternatives for large quantities of images.", "zh-TW": "标题:有沒有便宜的渠道可以購買Banana Pro?正在為大量圖像尋找更便宜的替代方案。", "ja": "タイトル:Banana Proを安く購入できるチャネルはありますか?大量の画像について、もっと安価な代替手段を探しています。」, "ko": "제목: Banana Pro를 저렴하게 구입할 수 있는 채널이 있나요? 대량의 이미지에 대해 더 저렴한 대안을 찾고 있습니다。」, "de":="Titel: Gibt es günstige Kanäle, um Banana Pro zu erwerben? Ich suche nach kostengünstigeren Alternativen für große Mengen an Bildern.", "fr":="Titre : Y a-t-il des canaux abordables pour acheter Banana Pro ? Je cherche des alternatives moins chères pour de grandes quantités d’images.", "it":="Titolo: Esistono canali economici per acquistare Banana Pro? Sto cercando soluzioni più convenienti per grandi lotti di immagini."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Have you heard that gravity can be reversed? Do anyone have been re-arrested and voted? |
严重
[准确性]
"Have you heard that gravity can be reversed?"
理由: Incorrect translation: '反重力' refers to 'Antigravity' (a service), not 'gravity'. '反代' means 'reverse proxy', not 'gravity can be reversed'. '被封' means 'banned', not 're-arrested'. '投票' means 'vote', not 'voted' as a verb phrase. | “反重力”是服務/站點名字,不是物理上的重力;直譯為“gravity can be reversed”嚴重誤解。 | Mistranslation of '反重力' (Antigravity, a specific tool name) as a physics concept. | Should be 'I heard Antigravity can be reverse proxied again? Has anyone been banned again? Come in and vote.' The hypothesis is grammatically incorrect and semantically inaccurate. |
| CohereLabs/tiny-aya-water | Title: Have you heard that gravity can be reversed? Do anyone have been re-closed for voting? |
严重
[准确性]
"Have you heard that gravity can be reversed?"
理由: Mistranslation of '反重力'. In this context, it refers to a specific service/tool (Antigravity), not the physical concept of reversing gravity. | Grammatically incorrect and inaccurate - should be 'I heard Antigravity can be reverse proxied again? Has anyone been banned again? Come in and vote.' | Misreads "反重力" (Antigravity service) as literal gravity; changes subject entirely. |
| Qwen/Qwen2.5-14B-Instruct | Title: I heard that Anti-Gravity proxies are working again? Can everyone who has re-enabled them vote?, Content: Yesterday I saw that Anti-Gravity seems to be unblocked and some users have started using proxies again. But we're still afraid of getting blocked again. Please, can those who are already using proxies for over 24 hours without being banned cast your votes so we can see the situation? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Is anti-gravity now reversing again? Does anyone who was resealed want to vote? |
严重
[准确性]
"Is anti-gravity now reversing again?"
理由: '反重力又可以反代了' means 'Antigravity can be reverse proxied again'; 'now reversing again' is incorrect. | Incomplete and unclear translation - should be 'I heard Antigravity can be reverse proxied again?' | Severe mistranslation and incompleteness. '反代' is 'reverse proxy', not 'reversing'. '被封' is 'banned', not 'resealed'. The entire content paragraph is missing. | Incorrect: Title missing entirely, only hypothesis shows 'Is anti-gravity now reversing again?' which is a question, not a title |
| Qwen/Qwen3-14B | Sentito dire che l'antigravità possa essere usata di nuovo per il proxy? Chi è stato nuovamente bloccato, vota pure |
致命
[准确性]
"Sentito dire che l'antigravità possa essere usata di nuovo per il proxy? Chi è stato nuovamente bloccato, vota pure"
理由: Translation is in Italian instead of target language Chinese (zh). Complete language mismatch. | The hypothesis translates only the title and omits the entire content paragraph explaining the context and request for votes. | Omette la parte descrittiva su revoca del ban, timore di nuovi ban, richiesta ai bros già in reverse proxy di votare dopo 24h senza ban per raccogliere statistiche. |
| Qwen/Qwen3.5-4B | Heard that Anti-Gravity can anti-proxy again? Has anyone else been banned and re-banned? Yesterday I checked and it seems Anti-Gravity has been unbanned, so some users have started proxying again, but they're still afraid of being banned again. Please vote for those who have already been proxyed to see if we can get an idea of how many people have been proxyed without getting banned. |
严重
[准确性]
"Has anyone else been banned and re-banned?"
理由: Mistranslation of '重新被封进来投个票' (If you've been banned again, come in and vote). The hypothesis asks a question instead of issuing the call to action. | Title is incomplete and unclear - should be 'I heard Antigravity can be reverse proxied again? Has anyone been banned again? Come in and vote.' | Title asks “有没有重新被封” = anyone banned again; “else” and “re-banned” are okay, but later content about voting is omitted |
| Qwen/Qwen3.5-9B | Title: Rumor has it that anti-gravity can be reverse-proxied again? Please vote if you've been re-banned and joined back. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Have you guys been re-banned after refollowing gravity again? Please vote if you've been re-banned. |
致命
[准确性]
"Have you guys been re-banned after refollowing gravity again? Please vote if you've been re-banned."
理由: Complete mistranslation of '反重力' (Antigravity, a slang for proxy tool) as 'refollowing gravity'. The meaning is entirely lost. | Completely misinterprets ‘反重力又可以反代了’; it refers to using reverse proxy with Antigravity (a service), not ‘refollowing gravity’; also ignores poll context and conditions from the title and content. | Incorrect translation: '反重力' (Antigravity) should be 'Antigravity' not 'gravity'. Also 're-banned after refollowing' is incorrect - should be 'reverse proxying again'. |
| google/gemma-3-12b-it | Title: Heard anti-gravity can be reverse proxied again? Has everyone been banned and voted again? Content: I checked yesterday and it seems like anti-gravity has been unbanned, and some veterans have started reverse proxying again, but I’m still afraid of getting banned. Please ask those who have already reverse proxied to vote for us so we who haven’t reverse proxied can see the situation. Already reverse proxied for more than 24 hours and not yet banned... | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "Did the anti-gravity system finally unlock? Some users are starting to reverse vote again, but they are still afraid and blocked, please ask those who have been blocked to retweet us and count our situation for us.", "zh-TW": "昨天我看了好像反重力解封了,一些用户又开始反代了,但仍然害怕并且被封锁,请问已经被封禁的用户请大家 retweet 我们,统计一下给我们没有反代的看看情况吧。", "ja": "昨日、反重力システムが解封したかと思ったのですが、一部ユーザーが再び反代を始めました。しかし、まだ恐れてブロックされているため、ブロックされたユーザーにリツイートをお願いします。すでに24時間以上反代しており、ブロックされていません...", "ko": "어제 반중력 시스템이 해제되었는지 확인했지만, 일부 사용자들이 다시 반대 투표를 시작했습니다. 하지만 여전히 두려워 블록되어 있기 때문에, 블록된 사용자를 리트윗해 주시면 저희에게 상황을 알려주세요. 이미 24시간 이상 반대 투표가 진행되고 있으며, 블록되지 않았습니다..." "de": "Gestern wurde mir mitgeteilt, dass das Anti-Gravitationssystem gelöst wurde. Einige Nutzer beginnen wieder, zu stimmen, aber sie sind immer noch ängstlich und blockiert, bitte fragen Sie die Benutzer, die bereits blockiert wurden, sich zu retweeten und unsere Situation zu erfassen.", "fr": "Hier, hier, je pensait que le système anti-gravité était enfin débloqué. Certains utilisateurs commencent à voter à nouveau, mais ils sont toujours effrayés et bloqués, veuillez demander aux utilisateurs qui ont été bloqués de retweet nous et de compter notre situation pour nous.", "it": "Ciao, ho sentito dire che il sistema anti-gravità è stato finalmente sbloccato. Alcuni utenti stanno iniziando a votare di nuovo, ma sono ancora spaventati e bloccati, per favore chiedete agli utenti che sono stati bloccati di retweetci e contate la nostra situazione per noi." } |
致命
[准确性]
"anti-gravity system finally unlock"
理由: The hypothesis provides translations in multiple languages instead of Chinese (zh). The content translation is completely missing. | "反重力" is a service (Antigravity) and "又可以反代了" is about being able to reverse proxy it again, not an abstract system unlocking. | The hypothesis provides a JSON object with multiple languages instead of a single translation. The English segment mistranslates '反代' (reverse proxy) as 'reverse vote'. It also mistranslates '投个票' (vote in a poll) as 'retweet'. The format is incorrect. | Hypothesis provides multiple language translations instead of single target translation. Contains mistranslations like 'reverse vote' instead of 'reverse proxy'. Improperly formatted response with multiple languages. | Mistranslation and addition of irrelevant details. |
| google/gemma-3-4b-it | Rumor has it that Anti-Gravity can be used for proxying again? Let's vote to see if anyone has been banned again. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Have you heard that anti-gravity is being reverse proxied again? Has anyone been banned recently? Please vote if you have been successfully reverse proxying. |
轻微
[准确性]
"Has anyone been banned recently? Please vote if you have been successfully reverse proxying."
理由: Drops detail about 24+ hours without bans and that voting is for those already reverse proxying; core intent kept but less specific. | Confusing and inaccurate. Should be 'I heard Antigravity can be reverse proxied again? Has anyone been banned again? Come in and vote.' The hypothesis conflates multiple ideas. | The translation omits the specific condition requested in the source ('大家有没有重新被封进来投个票' implies asking those who might have been banned again to vote/report, or generally polling the status). The hypothesis simplifies the nuance of checking if bans have recurred compared to the source's specific request for users to vote on their status. | The translation omits the specific timeframe of 24 hours mentioned in the source. |
| google/translategemma-4b-it | Title: Has anti-gravity been unbanned again? Please vote for those who have been banned again. |
严重
[准确性]
"Please vote for those who have been banned again."
理由: The title is confusing and inaccurate. Should be 'I heard Antigravity can be reverse proxied again? Has anyone been banned again? Come in and vote.' The hypothesis doesn't capture the meaning correctly. | Source asks those who are *using* it (and haven't been banned) to vote. Hypothesis asks those who *have been banned* to vote, reversing the logic. | Reverses intent; source asks those who are already reverse proxying to vote about whether they have been banned, not to vote on behalf of people who were banned again |
| tencent/HY-MT1.5-1.8B | Translation of '听说反重力又可以反代了?大家有没有重新被封进来投个票' in en |
严重
[准确性]
"Translation of '听说反重力又可以反代了?大家有没有重新被封进来投个票' in en"
理由: Hypothesis is placeholder text, not actual translation | The hypothesis is a meta-string describing the task. It fails to provide the actual translation of the title or the content. | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | "en": "Title: Have you heard that anti-gravity can be used for counter-subversion again? Has anyone been re-blocked and want to vote?", "zh-TW": "Title: Have you heard that anti-gravity can be used for counter-subversion again? Has anyone been re-blocked and want to vote?", "ja": "Title: 聞いたことありますか?反重力が再び反逆行為に使えるようになったって?誰かが再びブロックされて、投票したい人はいますか?", "ko": "Title: 들으셨나요? 반중력이 다시 반역 행위에 사용될 수 있다고 하네요. 누군가가 다시 차단되었다면, 투표하고 싶은 분 계신가요?", "de":="Titel: Haben Sie schon gehört, dass Anti-Gravitation wieder zur Unterdrückung von Subversion eingesetzt werden kann? Gibt es jemanden, der wieder blockiert wurde und gerne abstimmen würde?", "fr":="Titre: Avez-vous entendu dire que l’anti-gravité pouvait être utilisée à nouveau pour la subversion ? Y a-t-il quelqu’un qui a été bloqué à nouveau et qui souhaite voter ?", "it":="Titolo: Avete sentito dire che l’anti-gravità può essere utilizzata di nuovo per scopi subversivi? C’è qualcuno che è stato bloccato di nuovo e vuole votare?"} |
严重
[准确性]
"anti-gravity can be used for counter-subversion again"
理由: Confusing translation. Reference clarifies 'Antigravity can be reverse proxied again' which is more accurate to the source meaning. | Completely misreads “反代” (reverse proxy) as “counter-subversion”; changes domain and meaning. | Does not match the reference content. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Total ride to ChatGPT Team is unclear, I don't understand at all |
严重
[准确性]
"Total ride to ChatGPT Team is unclear, I don't understand at all"
理由: “总刷到 ChatGPT Team 拼车”是“總看到拼車帖子/拼團共享”的意思,而不是“total ride”;譯文結構混亂,未正確表達“經常刷到……帖子”的含義。 | Should be 'I keep seeing posts about sharing ChatGPT Team subscriptions, but I don't get it.' The hypothesis misses the context of shared subscriptions. | Incorrect translation: '总刷到' means 'keep seeing', not 'total ride'. '拼车' means 'carpooling/shared subscription', not 'ride'. The translation is nonsensical. | Mistranslation of '总刷到' (Keep seeing/Always scrolling into) as 'Total ride'. |
| CohereLabs/tiny-aya-water | Title: Total ride to ChatGPT Team is unclear, I don't understand. |
严重
[准确性]
"Total ride to ChatGPT Team is unclear, I don't understand."
理由: Mistranslation - should be 'I keep seeing posts about sharing ChatGPT Team subscriptions' not 'Total ride to ChatGPT Team is unclear' | Misinterprets "总刷到" (keep seeing) and "拼车" (cost-sharing ride/pooled subscription). The hypothesis is not idiomatic and misses the notion of seeing many shared subscription posts. | Mistranslation of '总刷到...拼车'. '总刷到' means 'I keep seeing/scrolling past'. '拼车' means 'carpooling' (metaphor for group buying/sharing). 'Total ride' is nonsensical. |
| Qwen/Qwen2.5-14B-Instruct | Title: Keep encountering Carpool by ChatGPT Team, couldn't understand Content: Regular plus costs $19.9 but team version is $30; and each seat needs to be paid for, does that mean much higher quota? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Always seeing the ChatGPT Team carpool, didn't get it Content: Regular plus $19.9, but the team version is $30; and everyone has to pay per seat, does that mean the quota is much higher? |
轻微
[风格]
"carpool"
理由: Terminology: '拼车' in this context means 'sharing a subscription', 'carpool' is a metaphor but acceptable. Reference uses 'sharing subscriptions' | 'Carpool' is a literal translation of '拼车'. 'Shared subscription' or 'group buy' is clearer in this context. | '拼车' in subscription context is 'sharing a subscription' or 'group buy'; 'carpool' is literal and slightly awkward but understandable. |
| Qwen/Qwen3-14B | Vedo spesso il carpooling del team ChatGPT, non ci capisco niente. |
致命
[准确性]
"Vedo spesso il carpooling del team ChatGPT, non ci capisco niente."
理由: Traduce solo la prima frase e omette i dettagli sui prezzi di Plus vs Team, costo per seat e domanda sui limiti di quota. | The hypothesis translates only the title and omits the entire content paragraph discussing pricing and quotas. | Translation is in Italian instead of target language Chinese (zh). Complete language mismatch. Content section is missing. |
| Qwen/Qwen3.5-4B | Title: Keep seeing Carpooling with ChatGPT Team, but I don't understand. Content: Regular Plus is $19.9, but the Team version costs $30; and you have to pay for each seat. Is the quota much larger? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Always seeing ChatGPT Team carpooling, don't understand Content: Regular plus is $19.9 but team version costs $30; and each seat requires payment - does this mean significantly higher quota? |
轻微
[术语]
"carpooling"
理由: '拼车' in this context means 'sharing subscriptions', not 'carpooling'. Should be 'sharing ChatGPT Team subscriptions' | “拼车” in SaaS context means sharing a subscription; “carpooling” is metaphorically accurate but may be confusing without context. | '拼车' is literally 'carpooling', used as slang for 'subscription sharing'. 'Carpooling' is acceptable but 'sharing subscriptions' is clearer. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: "Total brushing to ChatGPT Team carpool, didn't understand"; Content: "Normal plus $19.9 but team version costs $30; and each seat needs to be paid, does the budget need to increase a lot?" |
严重
[流畅性]
""Total brushing to ChatGPT Team carpool, didn't understand""
理由: Nonsensical phrase; should be something like ‘I keep seeing ChatGPT Team carpool posts but don’t understand them.’ | The translation is incoherent. 'Total brushing' is a literal and incorrect translation of '总刷到' (Keep seeing/scrolling past). The sentence structure is broken. | Awkward translation. Reference provides clearer version: 'I keep seeing posts about sharing ChatGPT Team subscriptions, but I don't get it.' |
| google/gemma-3-12b-it | Title: I got directed to a carpool for the ChatGPT Team, but I don't understand. Content: Regular Plus is $19.9, but the Team version costs $30; and you have to pay per seat. Does the quota need to be much higher? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '总刷到 ChatGPT Team 拼车,没看懂啊' in en |
致命
[准确性]
"Translation of '总刷到 ChatGPT Team 拼车,没看懂啊' in en"
理由: No meaningful translation provided. | The hypothesis is not a proper translation; it indicates it is a translation to English but the target language is Chinese (zh). Also, the content is missing entirely. | Hypothesis does not provide actual translation, only states 'Translation of...' without delivering the translation. | The hypothesis contains only meta-text describing the task. The actual content regarding pricing ($19.9 vs $30) and the question about quota is completely missing. | No English translation is given; it's just a meta description. |
| google/gemma-3-4b-it | Title: Keep Getting Offered ChatGPT Team Rides, Can't Understand It Content: Standard Plus $19.9 but team version is $30; and each seat needs to pay, is the quota really that much larger? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: I keep seeing ads for "Carpooling to the ChatGPT Team," but I don't understand. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Always seeing ChatGPT Team carpooling, but I don't understand. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '总刷到 ChatGPT Team 拼车,没看懂啊' in English |
严重
[准确性]
"Translation of '总刷到 ChatGPT Team 拼车,没看懂啊' in English"
理由: Hypothesis is placeholder text, not actual translation | The hypothesis is a meta-string describing the task. It fails to provide the actual translation of the title or the content. | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | "en": "Title: Finally got to use the ChatGPT Team carpool service, but I don’t understand it.", "zh-TW": "標題:終於用了ChatGPT Team的拼車服務,但不太明白它的工作方式。", "ja": "タイトル:ついにChatGPT Teamのカープールサービスを利用できたけど、使い方がよくわからない。", "ko": "제목: 드디어 ChatGPT Team의 카풀 서비스를 이용했는데, 사용 방법이 잘 이해되지 않습니다.", "de":="Titel: Endlich habe ich den ChatGPT Team Carpool-Dienst genutzt, aber ich verstehe ihn nicht.", "fr":="Titre : J’ai finalement utilisé le service de carpooling de ChatGPT Team, mais je ne comprends pas comment il fonctionne.", "it":="Titolo: Ho finalmente utilizzato il servizio di carpooling di ChatGPT Team, ma non capisco come funziona."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Does anyone have done an experiment on this? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Are dirty IPs causing my large model to lose intelligence? Can switching to clean home broadband restore it? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Can dirty IPs cause large models to become less intelligent? Will switching to clean broadband restore them? Content: Has anyone done an experiment on this? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | "en": "Will dirty IPs make large models less intelligent? Can switching to clean broadband restore them?", "zh-TW": "脏IP會讓大模型變笨嗎?切換到乾淨寬頻可以恢復嗎?", "ja": "汚染されたIPが大型モデルを賢くしないか?クリーンなブロードバンドに切り替えることで回復できるか?", "ko": "臟IP는 대모델을 어 若要输出正确的JSON格式,请确保每个语言的翻译都符合规则,并且保持结构一致。以下是根据给定规则生成的JSON对象: | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: Can dirty IP cause large models to lose intelligence? Can switching to clean home broadband restore it? Content: Has anyone done experiments? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Will dirty IPs cause LLMs to become less intelligent? Can switching to clean home broadband restore performance? Content: Has anyone conducted experiments? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Will dirty IPs cause large models to become less intelligent? Can switching to clean home broadband restore them? Content: Has anyone experimented with this? |
轻微
[术语]
"become less intelligent"
理由: '降智' means 'shadowbanned/downgraded', not 'become less intelligent'. Should be 'be shadowbanned/downgraded' | “降智” in this tech context often implies downgrade/nerf in capability or quality; “become less intelligent” is literal but misses the shadowban/downgrade nuance highlighted in reference. | '降智' is slang for 'shadowbanned' or 'quality downgraded'. 'Become less intelligent' is a literal translation that sounds odd for an AI model's output quality. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Does a dirty IP cause big model to decrease intelligence? Can switching to a clean home network restore it? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Does a dirty IP address lead to reduced intelligence in large language models? Can switching to a clean residential IP restore performance? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Does a 'noisy IP' lead to reduced intelligence for large models, and can switching to a clean home wide network restore intelligence? |
轻微
[术语]
"reduced intelligence"
理由: '脏IP' should be 'dirty IPs' not 'noisy IP'. '降智' should be 'shadowbanned/downgraded' for better context. | '脏 IP' is better translated as 'dirty IP' in this context (blacklisted/bad reputation) rather than 'noisy'. '家宽' (residential broadband) is awkwardly translated as 'home wide network'. The content question '有没有佬做过实验' (Has any bro done an experiment?) is missing. | "降智" in this community context refers to shadowban/downgrade in quality, not literal IQ; "shadowbanned/downgraded" would be more accurate. |
| google/gemma-3-4b-it | Does Dirty IP Cause Large Models to Decline Intelligence? Can Switching to a Clean Home Network Restore It? | 通过 (无共识错误) |
| google/translategemma-12b-it | Does a dirty IP address cause large language models to become less intelligent? Can switching to a clean residential IP restore performance? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Will dirty IPs cause large models to lose intelligence? Can switching to a clean home network restore them? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '脏IP会导致大模型降智吗,切换成干净家宽能恢复吗' in en |
严重
[准确性]
"Translation of '脏 IP 会导致大模型降智吗,切换成干净家宽能恢复吗' in en"
理由: Hypothesis is placeholder text, not actual translation | The hypothesis is a meta-string describing the task. It fails to provide the actual translation of the title or the content. | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | "en": "Title: Can dirty IPs cause large models to lose their intelligence? Can switching to a clean internet connection restore them?", "zh-TW": "标题:脏IP会导致大型模型失去智能吗?切换到干净的网路连接能恢复它们吗?", "ja": "标题:汚れたIPアドレスが大規模モデルの知能を低下させる可能性はありますか?クリーンなインターネット接続に切り替えると回復できますか?", "ko": "제목: 더러운 IP 주소가 대규모 모델의 성능을 저하시킬 수 있나요? 깨끗한 인터넷 연결로 전환하면 복구될 수 있을까요?", "de":="Titel: Können schmutzige IP-Adressen dazu führen, dass große Modelle an Intelligenz verlieren? Kann ein Wechsel zu einer sauberen Internetverbindung sie wiederherstellen?", "fr":="Titre : Les adresses IP sales peuvent-elles rendre les grands modèles moins intelligents ? Un changement pour une connexion Internet propre peut-il les restaurer ?", "it":="Titolo: Gli indirizzi IP sporchi possono far perdere intelligenza ai grandi modelli? Un cambio a una connessione Internet pulita può ripristinarli?"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: How cute is Hakiemi |
严重
[准确性]
"How cute is Hakiemi"
理由: “哈基米”在參考中為可愛寵物/貓的暱稱,譯文既未保留這層含義,也語序奇怪(像疑問句但無問號),且“is Hakiemi”拼寫不一致。 | Incorrect translation: '哈基米' is a transliteration of 'hachimi' (a cute pet/cat), not 'Hakiemi'. The translation should reflect it's a cute pet. | Should be 'Why is the hakimi (cute pet/cat) so cute?' The hypothesis is incomplete and doesn't capture the full meaning. |
| CohereLabs/tiny-aya-water | Title: How cute is Hakiemi |
严重
[准确性]
"How cute is Hakiemi"
理由: Incomplete translation - should be 'Why is the hakimi (cute pet/cat) so cute?' not just 'How cute is Hakiemi'. Also missing context about what hakimi refers to | '哈基米' is a meme term for a cute cat/pet. Transliterating as 'Hakiemi' loses the cultural meaning. Should be translated as 'cute cat' or explained. | "哈基米" here refers to a cute pet/cat meme; transliteration "Hakiemi" without explanation loses the intended meaning. Also, the question form "How cute is" differs from the exclamatory "Why is ... so cute" tone. |
| Qwen/Qwen2.5-14B-Instruct | Title: Why is Hakimi so cute, Content: Hakimi is so cute |
严重
[准确性]
"Hakimi"
理由: Reference uses 'Why is the hakimi (cute pet/cat) so cute' - 'Hakimi' is a name, but '哈基米' refers to a cute pet/cat | Should be 'hakimi (cute pet/cat)' - missing context that this is a slang term for a cute pet | "哈基米" here is a cute meme/pet nickname, not the footballer Achraf Hakimi; translating as "Hakimi" risks wrong real-person reference. | Failure to recognize '哈基米' (Hajimi) as internet slang for a cute cat/pet (originating from a song). Translating it as a proper name 'Hakimi' loses the meaning entirely. |
| Qwen/Qwen2.5-7B-Instruct | Title: Hajimi is so cute Content: Hajimi is so cute |
严重
[准确性]
"Hajimi"
理由: Incorrect: Should be 'Why is the hakimi so cute?' to match the question format in source and reference | Mistranslation - '哈基米' should be 'hakimi (cute pet/cat)' with context, not 'Hajimi' | '哈基米' here refers to 'hakimi' (a cute pet/cat), not the name 'Hajimi'; misrendered proper noun. | Mistranslation/Transliteration issue. '哈基米' is a meme term often referring to a cute cat/pet. Translating it as a proper name 'Hajimi' without context loses the cultural meaning. |
| Qwen/Qwen3-14B | Title: How cute is HAKIMI Content: HAKIMI is so cute | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: How cute is HachiMi? Content: HachiMi is so cute | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: How can Haki-mi be so cute Content: Haki-mi is so cute | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Hakimi is so cute |
严重
[准确性]
"Hakimi"
理由: Failure to recognize '哈基米' as internet slang for a cute pet/cat (originating from a song). Translating it as a proper name 'Hakimi' loses the cultural context. | Treats '哈基米' as a person's name. Reference shows it should be interpreted as 'hakimi (cute pet/cat)' - a term for cute animals/pets. | ‘哈基米’ here is a cute pet/cat nickname, not necessarily the footballer Hakimi; using the proper name may mislead. |
| google/gemma-3-12b-it | Title: Why is Hakimi so cute? Content: Hakimi is so cute. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The title "哈基米怎么这么可爱" is translated to "Haki-mi how cute is he" in English. |
严重
[准确性]
"Haki-mi how cute is he"
理由: The hypothesis is in English instead of Chinese (target language is zh). It also provides an incorrect translation of '哈基米' (a term for a cute pet, often a cat) as 'Haki-mi'. | Misparses "哈基米" (a cute character/pet name) as "Haki-mi" and adds "he"; better as "Why is Hakimi so cute" without gender if unknown. | Incorrect translation. '哈基米' is a proper name/nickname, not transliterated as 'Haki-mi'. Should be 'Why is the hakimi so cute?' Reference indicates it's a cute pet/cat reference. | The hypothesis includes meta-text explaining the translation rather than just providing the translation. '哈基米' is a meme term for a cute cat/pet; transliterating as 'Haki-mi' without context or explanation (like the reference's 'hakimi (cute pet/cat)') reduces clarity. The content line is missing. |
| google/gemma-3-4b-it | Why is Hakim so adorable? |
严重
[准确性]
"Hakim"
理由: Hypothesis translates '哈基米' as 'Hakim' but reference indicates it refers to 'hakimi (cute pet/cat)', suggesting it's a nickname or reference to something specific, not a person's name | Mistranslation of '哈基米' (Hachimi/Hakimi, a meme term for cute pets/cats). Treating it as a human name 'Hakim' loses the meme context. | “哈基米” here refers to a cute pet/mascot (‘hakimi’), not the common Arabic name ‘Hakim’. |
| google/translategemma-12b-it | Title: Why is Hakimi so cute? Content: Hakimi is so cute. |
严重
[准确性]
"Hakimi"
理由: Failed to recognize '哈基米' as internet slang for a cute pet/cat, translating it as a proper name 'Hakimi' instead of explaining the slang or using 'cute pet'. | Source refers to a ‘hakimi’ as a cute pet/cat meme, not the footballer Achraf Hakimi; using capitalized personal name misleads. | Should clarify that 'hakimi' refers to a cute pet/cat, not the person Hakimi. The reference indicates '(cute pet/cat)' as clarification. |
| google/translategemma-4b-it | Why is Hakim so cute? |
严重
[准确性]
"Why is Hakim so cute?"
理由: Mistranslation of '哈基米' (Hachimi/Hakimi), which is internet slang for a cute cat/pet, treating it as a human name 'Hakim'. | Should be 'Why is the hakimi (cute pet/cat) so cute?' The source '哈基米' is not a person's name but a cute pet/cat reference. The hypothesis misses this context. | Omission of clarification |
| tencent/HY-MT1.5-1.8B | The title '哈基米怎么这么可爱'在英文中的翻译是 |
严重
[准确性]
"The title '哈基米怎么这么可爱'在英文中的翻译是"
理由: The translation does not match the source content. | The hypothesis outputs a sentence stating 'The translation of the title... is' but fails to actually provide the translation. It also ignores the content. | Hypothesis is placeholder text, not actual translation; appears to be meta-statement about translation rather than translation itself |
| tencent/HY-MT1.5-7B | Title: Why is Hakimi so cute? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: Why do you need OpenClaw? Currently, I haven't seen any real use cases for OpenClaw in all the articles and conversations about it. It's just a bunch of automated information aggregation and posting work that doesn't add much value. |
严重
[准确性]
"Why do you need OpenClaw?"
理由: 原文是“为什么您根本不需要OpenClaw”,語氣更強烈,譯文弱化為一般疑問。 | Contradicts source '为什么您根本不需要' (Why you don't need at all). The hypothesis asks why one *needs* it. | Incorrect translation: The title 'openclaw 龙虾🦞可以带来什么' means 'What can OpenClaw bring you?', not 'Why do you need OpenClaw?'. '龙虾' is a nickname for OpenClaw, not part of the question. | Should be 'Why you don't need OpenClaw at all.' The hypothesis reverses the meaning by removing 'don't' and 'at all'. |
| CohereLabs/tiny-aya-water | Content: Why do you need OpenClaw? Currently, I haven't seen any real use cases for OpenClaw in all the articles and discussions about it. It's just automated information aggregation and posting work that doesn't really add value. On one hand, it's limited by interface permissions... |
严重
[准确性]
"Why do you need OpenClaw?"
理由: Incorrect logic - should be 'Why you don't need OpenClaw at all' not 'Why do you need OpenClaw?' | The source says '为什么您根本不需要' (Why you don't need it at all). The hypothesis changes the meaning to 'Why do you need', which is the opposite. | Source says "为什么您根本不需要OpenClaw" (why you don't need OpenClaw at all), a stronger negation than "why do you need". |
| Qwen/Qwen2.5-14B-Instruct | Title: OpenClaw Lobster 🦞 What Can It Bring Content: Why You Don't Need OpenClaw At All. So far, I haven't seen any truly valuable use cases in all discussions about OpenClaw. They're just some trivial tasks like automatically summarizing information and posting... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: openclaw Lobster🦞 can bring what Content: Why you don't need OpenClaw at all. In all discussions and articles about openclaw so far, I haven't seen any truly valuable use cases; just some uninteresting aggregation of information and posting work, limited by API permissions... |
严重
[流畅性]
"openclaw Lobster🦞 can bring what"
理由: Incomplete/awkward: 'can bring what' is incomplete English. Should be 'what can it bring you' or 'what can OpenClaw bring you' | Unnatural sentence structure. Should be 'What can OpenClaw bring?'. | Word order is awkward; natural English would be 'What can OpenClaw (Lobster🦞) bring?' though meaning is clear. |
| Qwen/Qwen3-14B | Cosa può portare openclaw il granchio 🦞 |
严重
[准确性]
"Cosa può portare openclaw il granchio 🦞"
理由: Traduce solo il titolo e omette completamente la parte in cui si argomenta che non c’è ancora un caso d’uso di valore, limitazioni di permessi d’interfaccia, ecc. | The hypothesis translates only the title and omits the content. It also mistranslates '龙虾' (Lobster) as 'granchio' (Crab), contradicting the lobster emoji provided in the source. | Translation is in Italian instead of target language Chinese (zh). Complete language mismatch. Also mistranslates '龙虾' (lobster) as 'granchio' (crab). |
| Qwen/Qwen3.5-4B | Title: What can OpenClaw 🦞 bring? Content: Why you don't need OpenClaw at all. Currently, in all discussions about OpenClaw articles and conversations, I haven't seen a truly valuable use case yet; they're mostly trivial tasks like automatically aggregating information and posting content. On one hand, this is limited by API permissions... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: What can OpenClaw Lobster 🦞 bring? Content: Why you fundamentally don't need OpenClaw. So far, in all articles and discussions about OpenClaw, I haven't seen any truly valuable use cases; they're all trivial tasks like auto-summarizing information and posting, partly due to API permission limitations... | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | What can shrimp bring to you? OpenClaw: Why You Don't Need It. |
严重
[准确性]
"What can shrimp bring to you?"
理由: Title structure is wrong - should be 'What can OpenClaw bring you?' not 'What can shrimp bring to you?' | Mistranslation of '龙虾' (lobster) as 'shrimp'. Also, 'OpenClaw' is the subject, not 'shrimp'. | “龙虾” is the lobster nickname for OpenClaw, not generic ‘shrimp’; mistranslates the key term. |
| google/gemma-3-12b-it | Title: OpenClaw Lobster 🦞 - What Can It Bring? Content: Why You Don't Need OpenClaw At All. In all discussions about OpenClaw I’ve seen so far, I haven’t found a single valuable use case; they’re all superficial tasks like automated summaries and posts, constrained by interface permissions on one hand... | 通过 (无共识错误) |
| google/gemma-3-1b-it | Why do you really need OpenClaw? |
严重
[准确性]
"Why do you really need OpenClaw?"
理由: The hypothesis is in English instead of Chinese (target language is zh). It also mistranslates the title 'openclaw 龙虾🦞可以带来什么' (What can OpenClaw bring you?) as 'Why do you really need OpenClaw?', which is opposite to the reference meaning. | The source title asks 'What can OpenClaw bring?' and the content starts with 'Why you don't need OpenClaw'. The hypothesis flips the meaning to 'Why do you really need OpenClaw?', which contradicts the source text's stance. The rest of the content is missing. | Inverts the original question; source title asks what OpenClaw can bring, not why you need it. The body is about why you don't need it, but title meaning is lost. | Incomplete translation. Only translates title question, not the content. Misses the full context that the article argues against needing OpenClaw. Should be 'Why you don't need OpenClaw at all'. |
| google/gemma-3-4b-it | Title: What can openclaw 龙虾🦞 bring? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: openclaw Lobster 🦞: What can it bring? |
严重
[准确性]
"openclaw Lobster 🦞: What can it bring?"
理由: Adds “Lobster” which is a community nickname but not in source title; otherwise preserves question meaning while dropping the “you” focus. | Literal translation of '龙虾' (Lobster) alongside OpenClaw is confusing without context; missed the Content entirely which explains the argument. | Should be 'What can OpenClaw bring you?' The source's title is '龙虾🦞可以带来什么' which translates to 'What can it bring', not 'What can it bring?' as a standalone question about capabilities. |
| google/translategemma-4b-it | Title: openclaw lobster can bring what Content: Why you don't need OpenClaw at all. In all discussions about openclaw, I haven't seen any truly valuable use cases, only automated summaries and lackluster posting work, constrained by interface permissions... |
轻微
[流畅性]
"openclaw lobster can bring what"
理由: Unnatural structure | Should be 'What can OpenClaw bring you?' The hypothesis has poor word order and grammar. The title should be a proper question. | Unnatural English word order; should be “What can OpenClaw (Lobster) bring?” or similar | Unnatural word order and literal translation of '龙虾' (lobster) which is a nickname for OpenClaw, making the title sound broken. |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾 可以带来什么' in en |
严重
[准确性]
"Translation of '龙虾 可以带来什么' in en"
理由: Hypothesis is placeholder text, not actual translation; incomplete source text (missing 'openclaw')} | The hypothesis is a meta-string describing the task. It fails to provide the actual translation of the title or the content. | The translation only provides the title and not the content. |
| tencent/HY-MT1.5-7B | Title: What Can OpenClaw Lobster 🦞 Bring? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Electric car commuting recommendations sought |
严重
[准确性]
"Electric car"
理由: Mistranslation: '电动车' (electric car/bike) should be 'electric bike' not 'electric car'. The source clearly refers to commuting by electric bike, not car. | Source refers to '电动车' (electric vehicle/bike) for short commute (2-3km), not a car. Should be 'electric bike' or 'e-scooter', not 'car'. | Mistranslation of '电动车' which refers to an electric bike/scooter in this context, not a car. Also missing the translation of the Content entirely. | 源文与参考均指电动车/电瓶车/电动自行车通勤,通常为电动车或电动自行车,不是汽车;应译为electric scooter/bike等 |
| CohereLabs/tiny-aya-water | Title: Electric Car Commuting Recommendations Needed |
严重
[准确性]
"Electric Car"
理由: Mistranslation: '电动车' (electric bike/scooter) was incorrectly translated as 'Electric Car'. The source clearly refers to electric bikes for commuting, not cars. | “电动车”在此语境中指代电动自行车/电瓶车而非电动汽车,含义明显偏离。 | Mistranslation of '电动车' which refers to an electric bike/scooter in this context, not a car. Also missing translation of Content. | Source refers to '电动车' (electric bike/scooter) for short commute (2-3km), not electric car. Hypothesis mistranslates as 'Electric Car' which is incorrect context. |
| Qwen/Qwen2.5-14B-Instruct | Title: Seeking Recommendations for Electric Commute Vehicle Content: Located in Guangzhou, distance is two to three kilometers, budget is 2k to 4k, any experienced friends who can share their experience? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Recommendations for Electric Bicycles for Commuting Content: Coordinates in Guangzhou, about 2-3 kilometers away, budget between 2k to 4k, any experienced friends who can share their experience, looking for a commuting solution | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Richiesta di consigli per il trasporto in auto elettrica |
致命
[准确性]
"Richiesta di consigli per il trasporto in auto elettrica"
理由: Omits most source details (location, distance, budget, asking bros), changes “electric bike for commuting” to generic “transport in electric car”, and drops informal tone. | Complete mistranslation: source is about electric bike (电动车) recommendations, not electric car. The target language is Chinese but hypothesis is in Italian. | Mistranslation of '电动车' (e-bike/scooter) as 'car'. In the context of a 2-3km commute and 2k-4k budget, this refers to an e-bike, not an electric car. | Hypothesis is in Italian, not Chinese. Source is in Chinese and should be translated to Chinese or English, not Italian. Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Looking for recommendations for an electric bike commute Content: Located in Guangzhou, distance is 2-3 km, budget $2000-$4000. Any experienced users who can share their experience? Seeking a commute plan. |
严重
[准确性]
"budget $2000-$4000"
理由: Currency conversion error. The source '2k 到 4k' in the context of Guangzhou implies RMB, but the hypothesis translates it as USD ($), which is a significant factual error regarding the budget. | Currency unit incorrect - should be RMB (人民币), not USD ($) | Source says '2k到4k' (2k to 4k RMB), not USD. Hypothesis incorrectly uses dollar sign without specifying currency is RMB. | Omission of currency (RMB) | Misinterprets currency; source means 2k–4k RMB, not 2000–4000 USD.
轻微
[风格]
"Any experienced users who can share their experience?"
理由: Loss of slang register. The source '佬友' is specific community slang (bros/experts), which is better rendered as 'bros' (as in reference) rather than the generic 'experienced users'. | Missing colloquial tone - '佬友' (bro) not translated, reference uses 'bros' | Repetition of "experience" sounds awkward; more natural phrasing is possible. |
| Qwen/Qwen3.5-9B | Title: Electric Vehicle Commute Recommendations Content: Based in Guangzhou, distance of 2-3 km, budget between 2k and 4k. Any experienced users who can share their insights? Looking for a commuting solution. |
轻微
[风格]
"Any experienced users who can share their insights?"
理由: Loss of slang tone. '佬友' is community slang for 'bros' or 'experts', translated too formally as 'experienced users' compared to the reference 'bros'. | Translation is more formal than source; '佬友' (bro) informal tone lost | Loses the slangy, colloquial tone of "佬友" (bros) in the source. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Electric vehicle commuting request |
严重
[准确性]
"Electric vehicle commuting request"
理由: Only translates the title and omits the main content about location (Guangzhou), distance, budget, and asking for others’ experience and a commuting plan. | Mistranslation: '电动车' (electric bike/vehicle) is translated as 'Electric vehicle', but the reference clarifies it means 'electric bike'. Also missing the content translation entirely. | Severe under-translation. The hypothesis only translates the title concept loosely and omits the entire content section regarding location, budget, and the request for advice. | Hypothesis is incomplete - only translates the title, missing the content entirely. Source content includes specific details about location, distance, budget, and request for recommendations. | The translation is too vague and does not capture the specific details from the source. |
| google/gemma-3-12b-it | Recommendation Request for Electric Vehicle Commuting |
严重
[准确性]
"Recommendation Request for Electric Vehicle Commuting"
理由: Mistranslation: '电动车' refers to electric bike/scooter, not generic 'Electric Vehicle'. Also '佬友' (bro/friend) is omitted, and '求推荐' is mistranslated as 'Recommendation Request' instead of 'Looking for recommendations' | Missing content translation; only title provided, source includes full content about commute distance, budget, and request for advice | Omits key details from source (location Guangzhou, distance, budget, asking bros for experience); only keeps a vague topic-level summary. | The hypothesis only translates the title and completely omits the entire content section of the source text. |
| google/gemma-3-1b-it | Translation of '电动车通勤求推荐' in en |
严重
[准确性]
"Translation of '电动车通勤求推荐' in en"
理由: Not an actual translation; it is a meta description and omits the meaning of asking for recommendations for an electric vehicle for commuting. | Hypothesis is incomplete and does not provide actual translation. Should translate to something like 'Looking for electric bike recommendations for commuting' based on source content. | The hypothesis contains a meta-instruction describing the task instead of providing the actual translation of the source text. | The hypothesis only translates the title and not the entire content, making it incomplete. | Hypothesis is not a translation but a placeholder indicating translation was not performed |
| google/gemma-3-4b-it | Looking for recommendations on electric vehicle commuting. |
严重
[准确性]
"Looking for recommendations on electric vehicle commuting."
理由: Hypothesis only translates the title and omits the content entirely. The source includes specific details (location, distance, budget, request for advice) that are completely missing. | Omits key information from title and content: location (Guangzhou), distance (2–3 km), and budget (2k–4k RMB), and the request for experienced people to share a commuting plan. | The translation incorrectly interprets 电动车 as 'electric vehicle' when it actually refers to 'electric bike' in the context of the source (广州, 通勤, 2-3km distance). The reference correctly translates it as 'electric bike'. | The hypothesis only translates the title and completely omits the entire content section (location, distance, budget, request for advice), resulting in significant loss of information. | Should be 'electric bike' instead of 'electric vehicle' as per reference. |
| google/translategemma-12b-it | Looking for recommendations for electric vehicle commuting. |
严重
[准确性]
"Looking for recommendations for electric vehicle commuting."
理由: Hypothesis only translates the title and omits the content entirely. The source includes specific details (location, distance, budget, request for advice) that are completely missing. | Omits key details from source such as location (Guangzhou), commute distance (2–3 km), budget (2k–4k), and request for others’ experiences. | The hypothesis only translates the title and completely omits the entire content section (location, distance, budget, request for advice). |
| google/translategemma-4b-it | Title: Recommendation for Electric Vehicle Commuting |
严重
[准确性]
"Title: Recommendation for Electric Vehicle Commuting"
理由: Omission of content: The hypothesis only translates the title and completely omits the body content regarding location, budget, and request for advice. | Omitted key details such as Guangzhou location, 2–3 km distance, and 2k–4k budget; also misses the informal request for experience sharing and suggestions. | Mistranslation: '电动车' (electric bike/vehicle) was translated as 'Electric Vehicle' but the context shows it's about commuting 2-3km with 2k-4k budget, which clearly refers to electric bikes (电动自行车), not cars. The reference correctly identifies it as 'electric bike'. |
| tencent/HY-MT1.5-1.8B | Electric Vehicle Commuting Recommendations |
严重
[准确性]
"Electric Vehicle Commuting Recommendations"
理由: The translation completely mistranslates the source. '电动车' in this context refers to electric bikes/scooters, not electric vehicles in general. The source is asking for commuting recommendations for an electric bike, not an electric car. | The translation is accurate but lacks specific details such as location and budget. | Omits key details from the source such as location (Guangzhou), commute distance, budget range, and request for others' experience; also misinterprets 电动车 as generic electric vehicle instead of electric bike in this context. | The hypothesis only translates the title and completely omits the content section (location, distance, budget, request for advice). Additionally, 'Electric Vehicle' is a mistranslation of '电动车' in this context, which refers to an 'e-bike' or 'scooter', not a car. |
| tencent/HY-MT1.5-7B | "en": "Title: Recommendations for Electric Vehicle Commuting", "zh-TW": "标题:电动车辆通勤推荐", "ja": "标题:电动车辆通勤建议", "ko": "标题:电动车辆通勤推荐", "de":="Titel: Empfehlungen für den Einsatz von Elektrofahrzeugen zum Pendeln", "fr": "Titre : Recommandations pour le transport quotidien en véhicule électrique", "it": "Titolo: Raccomandazioni per il commuting con veicoli elettrici"} |
严重
[准确性]
""en": "Title: Recommendations for Electric Vehicle Commuting", "zh-TW": "标题:电动车辆通勤推荐", "ja": "标题:电动车辆通勤建议", "ko": "标题:电动车辆通勤推荐", "de":="Titel: Empfehlungen für den Einsatz von Elektrofahrzeugen zum Pendeln", "fr": "Titre : Recommandations pour le transport quotidien en véhicule électrique", "it": "Titolo: Raccomandazioni per il commuting con veicoli elettrici""
理由: The hypothesis outputs a JSON list of translations for multiple languages instead of a single translation in the target language (zh). It also fails to translate the 'Content' section entirely. | The task target language is zh but the hypothesis is a multilingual field snippet, not a proper Chinese translation. It omits the content sentence about distance, budget, and asking for bros’ experience. | Mistranslation: '电动车' (electric bike/vehicle) was incorrectly translated as '电动车辆', losing the specific meaning of 'electric bike' which is the intended meaning in the source context. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: The community's Api is back alive, try $2000 dollars to test the effect, (CC, CX not supported yet, tool calls not implemented yet) With small surprises and strong return, claude 4.6/4.5 full series gemini 3.1 ,gpt-5.4 multi-modal |
轻微
[准确性]
"The community's Api"
理由: “佬友Api”更接近“佬友的API/老友的API”,community不算严重误译但不贴切 | '佬友' is specific slang for 'bros' or experienced users in this community, 'community' is too generic. | Mistranslation: '佬友' (bro/friend in Chinese internet slang) was translated as 'community' instead of 'bro/bros'. The reference correctly uses 'Bro's API'. | '佬友' is slang for 'bro/buddy', not 'community'. Should be 'Bro's API' not 'The community's API'. |
| CohereLabs/tiny-aya-water | Title: The friend API is back alive, try $2000 dollars to see the effect, (CC, CX not supported yet, tool calls not implemented yet) With small surprises returning strongly Claude 4.6/4.5 full series Gemini 3.1, gpt-5.4 multi-modal |
严重
[准确性]
"The friend API"
理由: Mistranslation: '佬友' (slang for 'bro' in Chinese internet culture) was translated as 'The friend', losing the cultural nuance. Should be 'Bro's API'. | “佬友Api”更接近“Bro's API”或类似论坛俚语,“friend”不完全贴合语气。 | Mistranslation of slang '佬友' (bros/expert friends) as literal 'friend'.
轻微
[流畅性]
"With small surprises returning strongly Claude 4.6/4.5 full series Gemini 3.1, gpt-5.4 multi-modal"
理由: Awkward phrasing compared to 'Returning strong'. | Grammatically awkward. Reference restructures this more naturally as 'Returning strong with a small surprise: full Claude 4.6/4.5 series, Gemini 3.1, GPT-5.4 multimodal.' | 英文结构不通顺,缺少标点和连接词,读起来生硬。 |
| Qwen/Qwen2.5-14B-Instruct | Title: The Friend Api is Back Online, Try It Out With $2000 (CC and CX Not Supported for Now), Strongly Returning With A Little Surprise - claude 4.6/4.5 Full Series, gemini 3.1, gpt-5.4 Multi-modal Content: api.freestyle.cc.cd New API Aggregation Management Interface, Supports Multiple Channels Including Azure, Suitable For Secondary Distribution Management Key | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Old Friends Api is Back Alive, $2000 to Try Its Effect, (CC and CX Temporarily Not Supported, Tool Invocation Temporarily Unavailable) Bringing Surprises Strongly Back claude 4.6/4.5 Full Series Gemini 3.1, gpt-5.4 Multimodal Content: api.freestyle.cc.cd New API OpenAI Interface Aggregation Management, Supports Multiple Channels Including Azure, Can Be Used for Secondary Distribution Management Key |
严重
[准确性]
"Old Friends Api"
理由: Mistranslation of '佬友' (slang for 'bros' or 'experts') as 'Old Friends'. It refers to the community members, not the age of the API. | Should be 'Bro's API' - 佬友 refers to 'bros' not 'old friends' | Misinterprets '佬友' which here means 'bros' or 'fellows', not 'Old Friends' as a proper noun for the API.
轻微
[流畅性]
"Bringing Surprises Strongly Back"
理由: Awkward phrasing. 'Strongly back' is not natural English for '强势回归'. | Awkward phrasing; should be 'Returning strong with a small surprise' |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: The hypothesis is incomplete and contains generation artifacts, failing to translate the source content. | Hypothesis is incomplete and contains only ellipsis and closing brace. Does not provide any meaningful translation of the source content. | Empty/incomplete translation output |
| Qwen/Qwen3.5-4B | Title: Lao You API is back! Try $2000 for results (CC, CX temporarily unsupported, tool calling not yet implemented). Returning strongly with small surprises to all Claude 4.6/4.5 series and Gemini 3.1, gpt-5.4 multimodal models. Content: api.freestyle.cc.cd New API OpenAI interface aggregation management, supports multiple channels including Azure, can be used for secondary distribution management keys. |
轻微
[流畅性]
"can be used for secondary distribution management keys."
理由: Awkward phrasing. 'Secondary distribution management keys' is less clear than 'secondary distribution and key management'. | Word order is odd; more natural would be "for secondary distribution and key management". | Should be 'secondary distribution and key management' to match reference |
| Qwen/Qwen3.5-9B | Title: The LaoYou API is back! Try $2000 worth for effect, (CC and CX temporarily unsupported, tool calling not yet implemented) Strongly returns with small surprises Claude 4.6/4.5 series, Gemini 3.1, GPT-5.4 multimodal Content: api.freestyle.cc.cd New API Aggregation Management for OpenAI interfaces, supports multiple channels including Azure, can be used for secondary distribution management of keys. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: The Friend's Api is Back Again, Try It for $2000, (CC, CX Temporarily Not Supported, Tool Callouts Temporarily Unset) |
轻微
[风格]
"The Friend's Api is Back Again"
理由: The translation is slightly formal and does not capture the slang tone of the source. | Translates "佬友" literally as "Friend" and drops the slangy "bro" nuance; acceptable but less natural given the reference. | Inconsistent capitalization with original slang term | '佬友' is community slang for 'bros' or 'fellow experts'. 'The Friend's' sounds unnatural and misses the community tone.
严重
[准确性]
"Try It for $2000"
理由: Mistranslation: '佬友' (bro/friend in Chinese internet slang) is incorrectly translated as 'The Friend'. Also missing the Content section entirely. | Misinterprets '$2000 刀试试效果'. It means testing with a $2000 quota/credit, not paying $2000 to try it. | Misinterprets "2000刀" which refers to a $2000 quota/credit, not a one-time try price; also omits that this is for testing performance. | Incomplete translation - missing significant content about Claude 4.6/4.5, Gemini 3.1, GPT-5.4 multimodal features and the 'small surprise' mentioned in source. Also missing the content section entirely. |
| google/gemma-3-12b-it | Title: Old Friend Api is back to life! Let's try $2000 to see the effect (CC, CX currently not supported, tool calling is temporarily unavailable) Strong return with a little surprise, supporting Claude 4.6/4.5 full series, Gemini 3.1, gpt-5.4 multimodal Content: api.freestyle.cc.cd New API OpenAI interface aggregation management, supports various channels including Azure, can be used for secondary distribution and key management. | 通过 (无共识错误) |
| google/gemma-3-1b-it | API OpenAI interface management for aggregation, supports various channels including Azure, for secondary distribution management |
轻微
[准确性]
"for secondary distribution management"
理由: Misses explicit mention of key management present in the source and reference. | The hypothesis misses some technical details present in the reference. | Hypothesis only translates part of the content (the second paragraph). It completely omits the title and first paragraph about the API being back with pricing and model information. | The hypothesis omits the specific domain name 'api.freestyle.cc.cd' and the functionality regarding 'key' management mentioned in the source and reference. |
| google/gemma-3-4b-it | Title: The old friend API is back to life! Let's test it with $2000, (LinuxDo community context, CX temporarily not supported, tool calls haven't been figured out yet). Bringing back small surprises, Claude 4.6/4.5 and Gemini 3.1, gpt-5.4 multi-modal. |
严重
[准确性]
"(LinuxDo community context"
理由: Parenthetical incorrectly adds "LinuxDo community context" which is not present in source or reference, introducing unsupported information. | Unnecessary addition not present in reference. | The hypothesis incorrectly translates 'CC' as 'LinuxDo community context'. 'CC' refers to a specific model or service (likely Claude Code) paired with 'CX'. The source lists unsupported items; the hypothesis invents an explanation. | The hypothesis adds 'LinuxDo community context' which is not present in the source. This is an unrequested addition. |
| google/translategemma-12b-it | Title: Freestyle API is back to life! Try it out for $2000 and see the results. (Currently does not support CC, CX, or tool calls). It's back with some surprises! Claude 4.6/4.5 series, Gemini 3.1, gpt-5.4 multimodal. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: The 'laoyu' API is back! Let's try $2000, (CC, CX temporarily not supported, tool calls haven't been implemented yet). Come back with small surprises, Claude 4.6/4.5 and Gemini 3.1, gpt-5.4 multi-modal. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '佬友Api又活过来啦,$2000刀试试效果,(CC,CX暂时不支持,工具调用暂时没搞)带着小惊喜强势回归 claude 4.6/4.5全系列 gemini 3.1 ,gpt-5.4 多模态" }, "zh-TW": "Translation of '佬友Api又活过来啦,$2000刀试试效果,(CC和Xbox目前不支持,工具调用暂时还没搞)带着一些小惊喜强势回归Claude 4.6/4.5版本的全系列Gemini 3.1以及GPT-5.4的多模态功能", "ja": "Translation of '佬友Api又活过来啦,$2000刀试试效果,(CCやXboxは現時点ではサポートされていないが、ツールの呼び出しはまだ実施されていない。小さなサプライズを伴ってClaude 4.6/4.5の全シリーズとGemini 3.1、GPT-5.4のマルチモーダル機能で強力に復帰する'", "ko": "Translation of '佬友Api又活过来啦,$2000刀试试效果,(CC와 Xbox는 현재 지원되지 않지만, 도구 호출은 아직 실행되지 않았습니다. 작은 선물과 함께 Claude 4.6/4.5의 전체 시리즈 및 Gemini 3.1과 GPT-5.4의 다중 모달 기능으로 강력하게 복귀합니다.'", "de": "Translation of '佬友Api又活过来啦,$2000刀试试效果,(CC und Xbox werden derzeit nicht unterstützt, die Toolaufrufung ist noch nicht durchgeführt. Mit kleinen Überraschungen kehrt es stark zurück zu Claude 4.6/4.5 der gesamten Serie sowie Gemini 3.1 und GPT-5.4 der multiplen Modulation)', "fr": "Translation of '佬友Api又活过来啦,$2000刀试试效果,(CC et Xbox ne sont pas encore supportés pour l'instant, mais la commande des outils n'est pas encore effectuée. Avec quelques petites surprises, il revient vigoureusement à Claude 4.6/4.5 de toute la série ainsi qu'à Gemini 3.1 et GPT-5.4 de la modulation multiple)', "it": "Translation of '佬友Api又活过来啦,$2000刀试试效果,(CC e Xbox non sono ancora supportati al momento, ma il call degli strumenti non è ancora stato effettuato. Con alcune piccole sorprese, torna con forza a Claude 4.6/4.5 di tutta la serie, così come a Gemini 3.1 e GPT-5.4 della modalità multiforma)' } |
严重
[准确性]
"Xbox"
理由: CX refers to a technical feature/product, not Xbox. Reference correctly identifies it as 'CX' without translation. | Mistranslates CX as Xbox, introducing incorrect terminology. | Mistranslation of 'CX' as 'Xbox'. In this technical context, CX likely refers to a specific model or service, not the gaming console. |
| tencent/HY-MT1.5-7B | Assistant |
致命
[准确性]
"Assistant"
理由: The hypothesis contains only the word 'Assistant' and fails to translate any part of the source text. | The translation is completely off and does not convey the original meaning. | The hypothesis is not a translation at all and does not convey any of the source title or content information. | No translation provided: The hypothesis contains only 'Assistant' which is not a translation of the source content. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Level 2!! Drawing 2 GPT Teams Cars~ |
严重
[准确性]
"Level 2!!"
理由: Incomplete translation; the content paragraph is missing. | Mistranslation: '车' in this context means 'seats/shared subscription slots' (as in GPT Team subscription seats), not 'cars'. The reference correctly translates this as 'shared subscription seats'. | “二级了”在论坛语境是等级升级,Level 2基本正确但可更自然为"Reached Level 2"
严重
[准确性]
"Drawing 2 GPT Teams Cars~"
理由: Mistranslation of slang. '车' here means a shared subscription seat/group buy, not a literal car. 'Drawing' is also slightly off for a giveaway context compared to 'Giving away'. | Should be 'Giving away 2 GPT Team shared subscription seats' to match the reference. | “GPT Team车”指共享订阅位/车位,‘Cars’为直译俚语,易误解,应为shared subscription seats/slots | '车' in this context means 'shared subscription seat', not 'car'. Mistranslation of slang term. |
| CohereLabs/tiny-aya-water | Title: Level 2!! Draw 2 GPT Teams Car~ |
严重
[准确性]
"Draw 2 GPT Teams Car~"
理由: “车位”在此是“共享订阅名额/席位”,“Car”误译为车辆,含义错误。 | Mistranslation: '车' in this context means 'shared subscription seat/slot', not 'car'. Should be 'seats' or 'slots'. | Should be 'Giving away 2 GPT Team shared subscription seats'. | Literal translation of '车' (car) fails to capture the slang meaning of 'shared subscription seat/group buy'. 'Draw' is also less natural than 'Giving away' for a giveaway context. | Mistranslates '抽2个GPT Team车' as 'Draw 2 GPT Teams Car'. Should be 'Giving away 2 GPT Team shared subscription seats' - '抽' means giveaway/raffle, not draw in this context, and '车' is slang for shared subscription seat. |
| Qwen/Qwen2.5-14B-Instruct | Title: Level 2!! Draw 2 GPT Team cars~ Content: First, let me share my feelings: I accidentally discovered L station before... Initially, I didn't know how to drive team cars myself and only bought them... After researching for an entire night, I succeeded by chance. |
严重
[准确性]
"Draw 2 GPT Team cars~"
理由: Should be 'subscription seats' instead of 'cars' | Literal translation of '车' (car) fails to capture the slang meaning of 'shared subscription seat/account'. 'Draw' is also less natural than 'Giving away' in this context. | '抽2个GPT Team车' means giving away / raffling 2 shared subscription seats, not literal 'cars'. |
| Qwen/Qwen2.5-7B-Instruct | Title: Level 2 now!! Draw 2 GPT Team cars~ Content: To be honest: I stumbled upon L station earlier... Originally, I never drove team cars myself, always just buying them... After researching all night, I managed to do it by accident |
严重
[准确性]
"抽2个GPT Team车~"
理由: Should be 'Giving away 2 GPT Team shared subscription seats' - 'draw' and 'cars' are incorrect metaphors | 'Draw 2 GPT Team cars' omits the 'shared subscription seat/slot' nuance; 'cars' is incorrect here for the metaphorical '车' meaning shared plan. | Mistranslation of slang '车' (shared ride/subscription) as literal 'cars'. In this context, it refers to shared accounts.
严重
[准确性]
"L station"
理由: 'L station' is incorrect; should be 'LINUX DO' | Leaves 'L站' unexplained; reference expands it (LINUX DO). At least 'L-site' or similar would keep the meaning of a website/platform. | Failure to translate the specific proper noun/slang 'L 站' (LINUX DO forum), rendering it meaningless. | Should be 'I accidentally discovered LINUX DO' - L站 refers to LINUX DO, not a generic 'L station'
严重
[准确性]
"never drove team cars myself, always just buying them"
理由: Continuation of the 'car' metaphor error. Should be 'hosted' or 'managed' shared subscriptions. | '开team车' refers to hosting/organizing a shared subscription, not literally 'driving team cars'. | Should be 'I didn't know how to host a Team shared subscription' - 'drove team cars' is incorrect metaphor |
| Qwen/Qwen3-14B | Livello 2!! Estrai 2 auto GPT Team~ |
轻微
[准确性]
"Livello 2!! Estrai 2 auto GPT Team~"
理由: “抽2个GPT Team车” means giving away 2 GPT Team shared seats; “estrai 2 auto GPT Team” suggests drawing 2 cars, mistranslating “车位/车” as actual cars. | Wrong language: hypothesis is in Italian but target language is Chinese. Also 'Estrai' (extract) is incorrect - should be 'giving away' or 'raffling'. | Mistranslation of '抽' (draw/giveaway) as 'extract' and '车' (slang for shared subscription seat) as 'cars'. The meaning is completely lost. | Hypothesis is in Italian, not Chinese. Should be in Chinese or English. 'Estrai' (extract) is incorrect; should convey 'giving away' or 'drawing'. Complete language and content mismatch. |
| Qwen/Qwen3.5-4B | Level 2!! Win 2 GPT Team cars~ First, let me share my feelings: I recently stumbled upon L-Station... I was originally not familiar with driving team cars and had to buy them... After researching all night, I accidentally succeeded. |
严重
[准确性]
"Win 2 GPT Team cars~"
理由: Misinterprets "车"; here it means shared subscription seats/slots, not literal cars. | Literal translation of slang. '车' (car) here is slang for 'shared subscription/seat', not a vehicle. Should be 'seats' or 'subscriptions'. | Incorrect translation of 'shared subscription seats' | Misses key context: should be 'Reached Level 2!! Giving away 2 GPT Team shared subscription seats' - this is about giving away, not winning
严重
[准确性]
"L-Station"
理由: Failure to resolve entity. 'L 站' is a specific nickname for 'LINUX DO' (as seen in reference). 'L-Station' is meaningless to the target audience. | Should be 'LINUX DO' not 'L-Station' - incorrect identification of the platform | Source "L站" refers to a specific site (in ref, "LINUX DO"), not literally "L-Station". | Incorrect - should be 'LINUX DO' (L站 is a forum abbreviation) |
| Qwen/Qwen3.5-9B | Title: It's Level 2!! Drawing 2 GPT Team cars~ Content: First, let me share my experience: I happened to learn about L-station... Originally, I didn't know how to drive team cars either; I just bought them... After researching all night, I accidentally succeeded. |
严重
[准确性]
"Drawing 2 GPT Team cars"
理由: Misses the context of 'giving away'. Should be 'Giving away 2 GPT Team shared subscription seats' as in reference, not 'Drawing'. | "抽2个GPT Team车" means giving away / lottery for 2 shared subscription seats, not literal "cars". | Severe mistranslation of slang. '车' (car) in this context means a 'shared subscription ride/group buy', not a physical vehicle. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Second level!! Win 2 GPT Team cars~ Content: Let me share my experience: I accidentally learned about L station... I didn't know how to open team car before; I used to buy them... I studied for a whole night and somehow managed to succeed. |
严重
[准确性]
"Title: Second level!! Win 2 GPT Team cars~"
理由: Inaccurate translation of '二级' and '抽' - should be 'Reached Level 2' and 'Giving away' not 'Win'. Also 'team car' is imprecise terminology. | "二级了" refers to reaching Level 2, not "Second level" as an isolated statement; "抽2个GPT Team车" means giving away via lottery/raffle seats in a shared subscription, not literal "cars". | The translation incorrectly translates '抽2个GPT Team车' as 'Win 2 GPT Team cars', which is inaccurate. | Mistranslation of slang. '车' here means 'shared subscription seat/spot', not a physical 'car'. 'Win' is also slightly off; the user is giving them away. | Mistranslation: '车位' (shared subscription seat) is incorrectly translated as 'cars'. Also '二级' (reached level 2) is translated as 'Second level' which is awkward; '抽' (giveaway/draw) should be 'giveaway' not 'win'.
严重
[准确性]
"I accidentally learned about L station"
理由: Incorrect translation: 'L站' refers to 'LINUX DO' (a forum), not 'L station'. | 'L 站' is a specific nickname for 'LINUX DO'. 'L station' is too literal and loses the reference. | 'L站' refers to 'LINUX DO', not 'L station'. This is a specific platform name that should be preserved. |
| google/gemma-3-12b-it | Title: Level 2!! Draw 2 GPT Team cars~! Content: Let me share my experience: I recently learned about L station... I never used to lead a team car myself, I always bought them... But after researching for one night, I accidentally succeeded. |
严重
[准确性]
"I recently learned about L station"
理由: Incorrect; 'L站' refers to 'LINUX DO', not 'L station' | Incorrect translation: 'L站' refers to 'LINUX DO', not 'L station' | Failure to translate the specific proper noun 'L 站' (LINUX DO), rendering it meaningless. | Source implies “偶然了解了L站” (LinuxDo); “L station” is vague and loses specific reference reproduced in the reference.
严重
[准确性]
"Draw 2 GPT Team cars~!"
理由: Literal translation of '车' (car) fails to capture the slang meaning of 'shared subscription seats' or 'spots' in this context. | Incorrect translation; should be 'Giving away' not 'Draw', and 'shared subscription seats' not 'cars' | Mistranslation: '车' in this context means shared subscription seat, not 'car'. '抽' means giving away/raffling, not 'draw' | The term 'draw' is incorrect; it should refer to 'giving away' or 'sharing subscription seats.' |
| google/gemma-3-1b-it | Title: Two-Level Did! Grab 2 GPT Team Cars! |
严重
[准确性]
"Two-Level Did!"
理由: Mistranslation of '二级了' - should be 'Reached Level 2' not 'Two-Level Did'. Also 'Grab' is inaccurate for '抽' (giving away/drawing). | Mistranslation of '二级了' (Reached Level 2). The phrase 'Two-Level Did!' is nonsensical in English. | Incorrectly renders 升到二级用户等级 as “Two-Level Did!”, which is ungrammatical and unclear in English. | The hypothesis only translates the title and not the entire content, making it incomplete. | Incorrect translation - '二级' means 'reached level 2' not 'two-level', '车' is slang for shared subscription seat not literal car |
| google/gemma-3-4b-it | Level 2 already!! Draw 2 GPT Team cars! |
严重
[准确性]
"Level 2 already!! Draw 2 GPT Team cars!"
理由: "抽2个GPT Team车" refers to giving away/raffling 2 GPT Team shared seats, not literal "cars"; mistranslation of slang "车" for shared subscription, and misses nuance of giveaway/lottery. | Should be 'Giving away 2 GPT Team shared subscription seats' as per reference. | Hypothesis only translates title and completely omits the content. Also 'Draw' is inaccurate; should be 'Giving away' or similar. |
| google/translategemma-12b-it | Title: Level 2!! Drawing 2 GPT Team cars~ Content: First, let me share my experience: I recently learned about L station... I usually don't create team cars myself; I buy them. But after researching for one night, I accidentally succeeded. |
严重
[准确性]
"Drawing 2 GPT Team cars~"
理由: Literal mistranslation of slang. '车' (car) here means a shared subscription seat/group buy, not a physical vehicle. 'Drawing' is also slightly off for '抽' (giveaway/raffle). | Should be 'Giving away 2 GPT Team shared subscription seats' | Mistranslates '抽2个GPT Team车' as 'Drawing' when it means 'Giving away' or 'Raffling'. Also misses the context of 'shared subscription seats'. | “抽2个GPT Team车” means giving away / lottery for 2 GPT Team shared subscription seats, not literal ‘cars’. |
| google/translategemma-4b-it | Title: Level 2! Draw 2 GPT Team Cars! |
严重
[准确性]
"Draw 2 GPT Team Cars"
理由: '抽' here means 'give away/raffle seats' for a shared subscription, not literal 'cars'. | The translation does not clearly convey the giveaway aspect. | Should be 'Giving away 2 GPT Team shared subscription seats' - 'draw' misrepresents the giveaway context | Mistranslation of slang: '车' in this context refers to a shared subscription seat or ride, not a physical 'Car'. |
| tencent/HY-MT1.5-1.8B | Translation of '二级了!!抽2个GPT Team车~' in en |
致命
[准确性]
"Translation of '二级了!!抽2个GPT Team车~' in en"
理由: No actual translation is provided; it only describes that something is a translation. Also misses nuance that 抽 indicates giving away/lottery of two GPT Team shared seats. | The hypothesis contains no actual translation, only a placeholder description. It also omits the content section entirely. | The hypothesis is not a proper translation but a placeholder indicating translation to English. It fails to provide the actual translated content. | The translation lacks context and clarity compared to the reference. |
| tencent/HY-MT1.5-7B | "en": "Title: Level 2!! Draw 2 GPT Team cars~, ", "zh-TW": "标题:二级了!!抽取2辆GPT Team汽车~", "ja": "タイトル:レベル2です!!GPT Teamカーを2台引き当てます~", "ko": "제목: 레벨 2입니다!! GPT Team 차량을 2대 추첨합니다~", "de":="Titel: Level 2!! Ziehen Sie 2 GPT Team Autos aus~, ", "fr": "Titre : Niveau 2 !! Tirer au sort 2 voitures GPT Team~, ", "it": "Titolo: Livello 2!! Estrai 2 auto GPT Team~"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Are there any warriors who continue to defect after defection is unlocked? |
致命
[准确性]
"Are there any warriors who continue to defect after defection is unlocked?"
理由: Severe mistranslation. '反代' is short for 'reverse proxy' (technical term), not 'defect' (betrayal/leaving). This changes the entire meaning. | '反代' means 'reverse proxy', not 'defect'. Completely incorrect translation of technical term. | Mistranslation: '反代' (reverse proxy) was translated as 'defection', and '反重力' (Antigravity, a VPN/service name) was completely lost. The reference correctly uses 'reverse proxying' and 'Antigravity was unbanned'. |
| CohereLabs/tiny-aya-water | Title: Are there any brave warriors who continue to counter-replace after being unblocked? Content: After signing the commitment letter, I was unblocked. Are there any who continue to counter-replace? If you want to know the real story, that's how it is said. It's not like going against and getting banned. |
严重
[准确性]
"continue to counter-replace"
理由: Mistranslation: '反代' (reverse proxy) was incorrectly translated as 'counter-replace'. The term '反重力' (Antigravity, a VPN service) was also mistranslated as just 'being unblocked'. | Complete mistranslation of '反代' (reverse proxy). 'Counter-replace' is nonsensical in this technical context. | “反代”指“反向代理/反代”,而非“counter-replace”,技术含义错误。
严重
[准确性]
"If you want to know the real story, that's how it is said. It's not like going against and getting banned."
理由: Incomplete and inaccurate translation. Source '不反代和被封好像也没去别' is poorly translated. Reference correctly interprets as 'not reverse proxying feels no different from being banned.' | 原意是“不反代和被封好像也没区别”,译文意思不清且与原文结论相反。 | Garbled translation of the final sentence which compares not using the proxy to being banned. |
| Qwen/Qwen2.5-14B-Instruct | Title: Is there any brave soul who continued to use reverse proxy after Anti-Gravity was unlocked? Content: After signing the commitment letter and getting unblocked, is there anyone who continued using reverse proxy? I want to know about the real aftermath. Not using reverse proxy or being blocked doesn't seem to have led them anywhere else. |
严重
[准确性]
"doesn't seem to have led them anywhere else"
理由: Original means 'feels no different from being banned'; translation changes the meaning to 'didn't lead them anywhere else'. | Completely mistranslates '不反代和被封好像也没去别' (Not using reverse proxy feels no different from being banned). The hypothesis suggests a lack of consequence, whereas the source means the outcome is equally bad/useless. | Incorrect interpretation; should be 'not reverse proxying feels no different from being banned' - the source conveys equivalence, not a comparison of outcomes
轻微
[术语]
"Is there any brave soul who continued to use reverse proxy after Anti-Gravity was unlocked?"
理由: Should be 'unbanned' not 'unlocked'; more accurate for the context of account restrictions | Should be 'Antigravity' with correct capitalization | While '反重力' literally means anti-gravity, in this specific internet slang context (often referring to bypassing restrictions), it might be a specific codename. However, the bigger issue is the next error. |
| Qwen/Qwen2.5-7B-Instruct | Title: Is There Any Hero Who Continued Anti-Gravity After Unlocking? Content: After unlocking due to signing the pledge, is there anyone who continued anti-proxy? Want to know the real follow-up, so just say this, not anti-proxy and being banned doesn't seem to go elsewhere either. |
严重
[准确性]
"Anti-Gravity"
理由: Mistranslation of the proper noun '反重力' (Antigravity, a specific service name) as the physical concept. While the characters match, in this context it is a name. | Misinterprets '反重力'; in context it is a proper service/product name 'Antigravity', not the literal physical concept. | Should be 'Antigravity' - refers to a specific service name, not a literal concept |
| Qwen/Qwen3-14B | C'è qualche valoroso che continua a combattere contro la forza di gravità dopo essere stato sbloccato? |
致命
[准确性]
"C'è qualche valoroso che continua a combattere contro la forza di gravità dopo essere stato sbloccato?"
理由: Misreads “反重力” (Antigravity service) as literal gravity, and omits core idea of continuing reverse proxying after unban. | Wrong language: hypothesis is in Italian but target language is Chinese. 'Forza di gravità' (gravity) is incorrect - should be 'reverse proxy' (反代). Also missing context about 'Antigravity'. | Hypothesis is in Italian, not Chinese. Mistranslates '反重力' (Antigravity - a service name) as literal 'gravity'. Complete language mismatch. | Literal translation of '反重力' (Antigravity, a specific service name) instead of recognizing it as a proper noun. Also missed '反代' (reverse proxy). |
| Qwen/Qwen3.5-4B | Title: Are there any brave souls who continue to bypass restrictions after being unblocked following an anti-gravity commitment? Content: After signing the commitment letter and getting unblocked, are there anyone still continuing to bypass restrictions? I want to know the real outcome. Just saying this, not bypassing restrictions seems just as bad as being blocked. |
严重
[准确性]
"anti-gravity"
理由: Mistranslation of proper noun/slang. '反重力' (Anti-gravity) is likely a specific codename for a service or user group in this context, not a physics concept. 'Anti-gravity commitment' makes no sense. | "反重力" here is a proper name (Antigravity service), not literal anti-gravity. | Misinterprets source: should be 'reverse proxying after Antigravity was unbanned' - 'anti-gravity commitment' is incorrect |
| Qwen/Qwen3.5-9B | Title: Are there any brave souls who continued to reverse proxy after being unblocked from anti-gravity? Content: After signing the commitment letter and getting unblocked, did anyone continue to reverse proxy? I want to know the real follow-up. That's all I can say; not reversing proxy and getting banned doesn't seem to have stopped them. |
严重
[准确性]
"anti-gravity"
理由: Literal mistranslation of slang. '反重力' is a codename for a specific proxy service/provider, not the physical phenomenon. | Awkward phrasing. Should be 'Are there any brave souls who continued reverse proxying after Antigravity was unbanned?' The reference is clearer. | Misinterprets "反重力" (Antigravity, a proper service/site name) as a literal concept instead of the service name.
严重
[准确性]
"doesn't seem to have stopped them"
理由: Incorrect interpretation; source says 'not reverse proxying and getting banned feels no different' - meaning there's no point in not reverse proxying since you'll get banned anyway | Misinterprets the meaning. Should be 'feels no different from being banned' as in reference. | Misinterpretation of '没去别' (no difference). The source means 'not using it feels no different from being banned', whereas the hypothesis implies an action wasn't stopped. | Source meaning is "not reverse proxying feels no different from being banned"; hypothesis changes this to "didn't stop them", reversing the comparison. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Have you continued to counter after being unblocked due to anti-gravity? |
严重
[准确性]
"Have you continued to counter after being unblocked due to anti-gravity?"
理由: Mistranslation: '反代' (reverse proxy) is translated as 'counter', and '反重力' (Antigravity, a VPN service name) is translated literally as 'anti-gravity'. The reference shows these are proper nouns/technical terms. | Mistranslation of '反代'. In this context, it stands for 'reverse proxy', not 'counter'. | Completely misinterprets "反重力" (a service name) and "反代" (reverse proxy). Correct meaning is about continuing reverse proxying after the Antigravity service unban; current translation is semantically wrong. | Completely incorrect translation. '反重力' is 'Antigravity' (a service name), not 'anti-gravity'. '反代' means 'reverse proxying', not 'counter'. Missing content section entirely. | The translation is unclear and does not match the reference. |
| google/gemma-3-12b-it | Title: Are there any brave warriors who continue to defy gravity after unlocking it? Content: I unlocked it after signing the terms and conditions, is anyone still defying gravity? I want to know the real follow-up; just say so. It doesn't seem like those who don't defy gravity or get banned have gone elsewhere. |
致命
[准确性]
"defy gravity"
理由: Misinterprets “反重力/反代” which refers to Antigravity/reverse proxying, not literal gravity; completely changes topic. | Completely incorrect; '反重力' is 'Antigravity' (a service name), not literal 'defy gravity'; '反代' means 'reverse proxying', not 'defy gravity' | Literal translation of '反重力' (Antigravity, a specific service name) instead of recognizing it as a proper noun, leading to nonsensical meaning. | The term 'defy gravity' is a mistranslation; it should be 'reverse proxying.' |
| google/gemma-3-1b-it | Translation of 'Title: 有没有反重力解封后继续反代的勇士' in en |
严重
[准确性]
"Translation of 'Title: 有没有反重力解封后继续反代的勇士' in en"
理由: The hypothesis contains a meta-instruction describing the task instead of providing the actual translation of the source text. | Meta text instead of an English translation; drops key terms like Antigravity and reverse proxying. | The hypothesis only translates the title and not the entire content, making it incomplete. | Hypothesis is not a translation but a placeholder indicating translation was not performed | Hypothesis is incomplete placeholder text. Should provide actual translation like 'Are there any brave souls who continued reverse proxying after Antigravity was unbanned?' |
| google/gemma-3-4b-it | Title: Are there any heroes who continue to reverse proxy after anti-gravity unlock? |
严重
[术语]
"anti-gravity unlock"
理由: Should be 'Antigravity was unbanned' as per reference. | '反重力' (Anti-gravity) is likely a specific codename for a service or tool in this community. While literally correct, without context it sounds nonsensical compared to the reference's treatment of it as a proper noun 'Antigravity'. However, the bigger issue is the missing content. | Incorrectly translates 反重力 as 'anti-gravity' when it should be 'Antigravity' (the name of a VPN/proxy service). This is a proper noun referring to a specific service. |
| google/translategemma-12b-it | Title: Are there any brave souls who continued to use anti-gravity after unbanning? Content: I got unbanned after signing the agreement, but is anyone still using anti-gravity? I want to know about real experiences; just saying that not using it or getting banned doesn't mean you've gone elsewhere. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Any brave warriors who continued to DDoS after anti-gravity release? |
致命
[准确性]
"continued to DDoS"
理由: '反代' means 'reverse proxy'; mistranslated as 'DDoS', changing the core meaning. | Should be 'reverse proxying' - DDoS is completely incorrect and changes the meaning to illegal activity | Incorrect translation of '反代' as DDoS instead of reverse proxying. | Hallucination/Wrong term: '反代' means 'reverse proxy', not 'DDoS' (Distributed Denial of Service). |
| tencent/HY-MT1.5-1.8B | Translation of '有没有反重力解封后继续反代的勇士' in en |
致命
[准确性]
"Translation of '有没有反重力解封后继续反代的勇士' in en"
理由: The hypothesis contains no actual translation, only a placeholder description. It also omits the content section entirely. | The translation lacks context and clarity compared to the reference. | The hypothesis is not a proper translation but a placeholder indicating translation to English. It fails to provide the actual translated content. | Contains no real translation, only a meta description. Also does not reflect the specific reference to the Antigravity service and "brave souls" who continue reverse proxying. |
| tencent/HY-MT1.5-7B | Are there any warriors who continue to engage in anti-gravity activities after being unblocked? |
严重
[准确性]
"anti-gravity"
理由: Mistranslation of slang. '反重力' (Anti-gravity) is a codename for a specific proxy service, not literal anti-gravity physics. '反代' should be 'reverse proxying', not implied as part of anti-gravity activities. | Hypothesis translates '反重力' as 'anti-gravity' but reference shows it should be 'Antigravity' (proper noun - a service name). Also '反代' means 'reverse proxying', not 'anti-gravity activities'. | Misreads “反重力” which in context refers to the Antigravity service / reverse proxy situation; also omits mention of reverse proxying and commitment letter context. | Mistranslation: '反重力' (anti-gravity) is incorrect; the source refers to '反代' (reverse proxy). This is a completely wrong translation of the key term. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【补档】CCG v1.7.71 : Claude Code 编排三 CLI 协作 | Codex + Gemini + Claude | 加入了grok mcp | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Content: Because the post was deleted due to using AIGC without screenshots, manually edited and added this post. | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [Catch-up] CCG v1.7.71: Claude Code Arrangement III CLI Collaboration | Codex + Gemini + Claude | Added grok mcp Content: The post was deleted because AIGC was used without screenshots at that time, manually updating to repost |
轻微
[流畅性]
"manually updating to repost"
理由: Awkward phrasing. 'Manually editing and re-uploading' is clearer. | Slightly awkward phrasing; 'manually editing and reposting' would be more natural. | Should be 'manually editing and re-uploading the post' for better clarity and completeness |
| Qwen/Qwen2.5-7B-Instruct | Title: [Backlog] CCG v1.7.71 : Claude Code Arrangement III CLI Collaboration | Codex + Gemini + Claude | Added grok mcp Content: Because the post was made using AIGC without screenshots at the time, the post was deleted, so manually edited and reposted. |
轻微
[术语]
"Backlog"
理由: Less accurate than 'Re-upload' or 'Repost' for '补档' in this forum context. | Should be 'Re-upload' - 补档 means re-uploading, not backlog | Should be 'Re-upload' instead of 'Backlog'.
轻微
[流畅性]
"manually edited and reposted"
理由: Should be 'manually editing and re-uploading the post' | Should be 'manually editing and re-uploading the post' - tense and structure issue | Slightly awkward phrasing compared to 'manually editing and re-uploading'. |
| Qwen/Qwen3-14B | Title: [Backlog] CCG v1.7.71: Claude Code Formatting Three CLI Collaboration | Codex + Gemini + Claude | Added grok mcp | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Backup] CCG v1.7.71: Claude Code Orchestrates Three CLI Collaborations | Codex + Gemini + Claude | Added grok MCP Content: The post was deleted because I used AIGC but didn't take screenshots at the time. I manually created a new post to replace it. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Archive] CCG v1.7.71: Claude Code Orchestration III CLI Collaboration | Codex + Gemini + Claude | Added grok mcp Content: The post was deleted because I used AIGC when posting but didn't include screenshots, so I manually edited the post to add one. |
轻微
[准确性]
"add one"
理由: Source says "补个帖子" (re-upload the post), not specifically adding a single screenshot; reference clarifies intent is re-upload/edit the post, not just "add one". | Should be 'manually editing and re-uploading the post' - source implies reposting, not just adding a screenshot | Misinterpretation. '补个帖子' means 're-post the thread' (because the original was deleted), not just editing the existing post to add a screenshot. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Supplement] CCG v1.7.71 : Claude Code Compilation Three CLI Collaboration | Codex + Gemini + Claude | Added grok mcp | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [Update] CCG v1.7.71: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude | Added grok mcp Content: The post was deleted because it used AIGC but didn't include screenshots. I manually edited and reposted. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Due to using AIGC during that time, there were no screenshots, leading to the post being deleted. I manually edited the post to add a new post. |
轻微
[准确性]
"to add a new post"
理由: Slightly misleads; the user is re-uploading/editing the original post rather than creating an entirely new one. | The hypothesis misses some details present in the reference. | Inaccurate translation of '补个帖子' - should be 're-uploading the post' not 'add a new post'. The context indicates reposting the same content, not creating a new one. | Mistranslation of '补个帖子'. In this context, it means to re-upload or restore the deleted post, not to create a completely new unrelated post. The reference correctly uses 're-uploading'. |
| google/gemma-3-4b-it | Title: [Addendum] CCG v1.7.71: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude | Added grok mcp | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Update] CCG v1.7.71: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude | Added grok mcp Content: Because I used AI to generate content but didn't include screenshots when posting previously, the post was deleted. Here's an updated post. | 通过 (无共识错误) |
| google/translategemma-4b-it | "en": "Title: [Addendum] CCG v1.7.71: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude | Added grok mcp", "zh-TW": "標題:[補充] CCG v1.7.71:Claude Code 協調三 CLI 合作 | Codex + Gemini + Claude | 已加入 grok mcp", "ja": "タイトル:[補足] CCG v1.7.71:Claude Code の 3 つの CLI コラボレーション | Codex + Gemini + Claude | grok mcp を追加", "ko": "제목: [부록] CCG v1.7.71: Claude Code 3개의 CLI 협업 | Codex + Gemini + Claude | grok mcp 추가", "de": "Titel: \[Ergänzung] CCG v1.7.71: Claude Code Orchestrierung Drei CLI Zusammenarbeit | Codex + Gemini + Claude | grok mcp hinzugefügt", "fr": "Titre: \[Ajout] CCG v1.7.71: Claude Code Orchestration Trois CLI Collaboration | Codex + Gemini + Claude | grok mcp ajouté", "it": "Titolo: \[Allegato] CCG v1.7.71: Claude Code Orchestrazione Tre CLI Collaborazione | Codex + Gemini + Claude | grok mcp aggiunto" } <|im_end|> | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '补档' with LinuxDo community context", "zh-TW": "翻译以LinuxDo社区背景下的‘补档’”, "ja": "翻译以LinuxDo社区为背景的‘補档’”, "ko": "翻译以LinuxDo社区为背景的‘보정’”, "de": "Translation of ‘Bearbeitung’ in the context of the LinuxDo community", "fr": "Translation of ‘Réparation’ in the context of the LinuxDo community", "it": "Translation of ‘Correzione’ in the context of the LinuxDo community" } |
致命
[准确性]
"Translation of '补档' with LinuxDo community context"
理由: The hypothesis fails to translate the title or content, providing only metadata placeholders in multiple languages. | The translation lacks context and clarity compared to the reference. | The hypothesis is not a proper translation but a placeholder showing multiple language translations. It fails to provide the actual translated content in the target language (zh). |
| tencent/HY-MT1.5-7B | Title: [Backlog] CCG v1.7.71: Claude Code Arrangement Three CLI Collaboration | Codex + Gemini + Claude | Added grok mcp | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Cherry Studio can quickly adopt Little Crabs |
严重
[准确性]
"Little Crabs"
理由: Mistranslation: '小龙虾' is the Chinese internet slang for 'OpenClaw' (an AI tool), not literal 'Little Crabs'. The reference correctly uses 'OpenClaw'. | '小龙虾' is slang for 'OpenClaw', not literal 'little crabs'. Should be 'OpenClaw' or 'adopt/deploy OpenClaw'. | Mistranslation of slang. '小龙虾' refers to the specific tool 'OpenClaw', not literal little crabs. | “小龙虾”在语境中是OpenClaw项目昵称,不是字面小螃蟹;且crab与虾物种错误,应保留OpenClaw或lobster类绰号 |
| CohereLabs/tiny-aya-water | Title: Cherry Studio can quickly adopt small crabs |
致命
[准确性]
"Cherry Studio can quickly adopt small crabs"
理由: Mistranslation: '小龙虾' is the Chinese internet slang for 'OpenClaw' (an AI tool), not literal 'small crabs'. This is a critical mistranslation of the technical term. | Literal translation of '小龙虾' (crayfish/lobster) which is slang for 'OpenClaw'. 'Small crabs' loses the specific technical reference. | Completely incorrect translation. '小龙虾' (xiaolongxia) is slang for OpenClaw in this community context, not literal crabs. Should be 'Cherry Studio can quickly adopt/deploy OpenClaw'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Cherry Studio can quickly adopt baby lobsters Content: Mac is smoother, Windows didn't succeed | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Cherry studio can quickly adopt freshwater shrimp Content: Mac is smoother, Windows failed to install |
致命
[准确性]
"freshwater shrimp"
理由: Should be 'adopt/deploy OpenClaw' - 小龙虾 is slang for OpenClaw, not literal shrimp | Mistranslation of '小龙虾' (OpenClaw, a software tool) as literal 'freshwater shrimp'. 'Adopt' is also a mistranslation of '领养' which means 'deploy' or 'set up' in this tech context. | Misinterprets slang '小龙虾' which here refers to OpenClaw software, not actual shrimp. |
| Qwen/Qwen3-14B | Cherry studio può adottare rapidamente i granchi |
轻微
[准确性]
"i granchi"
理由: Mistranslation of '小龙虾' (OpenClaw, a specific software project) as literal 'crabs'. '领养' in this context means deploy/install, not adopt an animal. | “小龙虾/龙虾” in this context refers to OpenClaw software, not literal crabs; should preserve product name or metaphor consistently. | Hypothesis is in Italian, not Chinese. '小龙虾' refers to OpenClaw (a tool), not crabs. Complete language and terminology mismatch. | Wrong language: hypothesis is in Italian but target language is Chinese. 'Granchi' (crabs) is incorrect - should be 'OpenClaw' (小龙虾/lobster). Also 'adottare' (adopt) is wrong - should be 'deploy'. |
| Qwen/Qwen3.5-4B | Title: Cherry Studio can quickly adopt crayfish Content: Smoother on Mac, Windows failed |
严重
[术语]
"crayfish"
理由: Should be 'OpenClaw' | Incorrect - '小龙虾' is slang for 'OpenClaw', not literal crayfish | Should be 'OpenClaw' not 'crayfish' - '小龙虾' is a slang term for OpenClaw in this context, not literal crayfish |
| Qwen/Qwen3.5-9B | Title: Cherry studio allows quick adoption of crayfish Content: Mac is smoother, Windows failed to succeed |
严重
[术语]
"crayfish"
理由: 'Crayfish' is literal translation; '小龙虾' is slang for 'OpenClaw' (AI agent deployment) | Should be 'OpenClaw' | In this context "小龙虾" is community slang for the OpenClaw project, not literal crayfish.
轻微
[流畅性]
"failed to succeed"
理由: Redundant phrasing; "didn't work" or "failed" would be more natural. | Awkward phrasing. Should be 'it didn't succeed on Windows' as in reference. | Should be 'didn't succeed on Windows' - more natural |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Cherry Studio can quickly adopt small shrimp |
严重
[准确性]
"small shrimp"
理由: '小龙虾' is a slang term for 'OpenClaw', not literal 'small shrimp'. Hypothesis completely misses the content section about Mac/Windows performance. | Mistranslation: '小龙虾' (OpenClaw, an AI agent) is translated literally as 'small shrimp' instead of using the proper name 'OpenClaw'. | "小龙虾" here is a nickname for the OpenClaw project, not literal small shrimp; translation loses the product reference. | Mistranslation of slang. '小龙虾' (crayfish) is a nickname for 'OpenClaw' (an AI tool). Translating it literally as 'small shrimp' makes the sentence nonsensical. |
| google/gemma-3-12b-it | Title: Cherry studio can quickly adopt small lobsters Content: Mac is smoother, Windows was not successful |
轻微
[准确性]
"adopt small lobsters"
理由: The term 'small lobsters' is incorrect; it should be 'OpenClaw.' | Literal translation of '小龙虾' (OpenClaw slang) as the animal 'small lobsters' completely loses the technical meaning of the software tool. | Incorrect; '小龙虾' is slang for 'OpenClaw', not literal 'small lobsters'; should be 'Cherry Studio can quickly adopt/deploy OpenClaw' | “小龙虾” here is the OpenClaw project, not literal lobsters; misses key branded/technical meaning. |
| google/gemma-3-1b-it | Cherry studio is very smooth and easy to adopt small shrimp. |
严重
[准确性]
"easy to adopt small shrimp"
理由: Critical mistranslation - '小龙虾' is slang for OpenClaw (AI agent), not literal small shrimp | Misinterprets “小龙虾” which refers to OpenClaw software, not literal shrimp; also omits Mac/Windows behavior contrast. | The hypothesis only translates the title and not the entire content, making it incomplete. | Completely incorrect translation. '小龙虾' is a slang term for OpenClaw, not literal shrimp. Should be 'Cherry Studio can quickly adopt/deploy OpenClaw'. Also misses that only the Mac part is smooth. |
| google/gemma-3-4b-it | Title: Cherry studio can quickly adopt small lobsters Content: Mac is smoother, Windows failed. |
严重
[准确性]
"small lobsters"
理由: Mistranslates '小龙虾' (slang for OpenClaw) as 'small lobsters' instead of 'OpenClaw'. Should be 'adopt/deploy OpenClaw'. | Should be 'OpenClaw' as per reference. | "小龙虾" is community slang referring to OpenClaw, not literal lobsters; mistranslation loses the product/tool reference. |
| google/translategemma-12b-it | Title: Cherry studio - Quickly adopt your little crawfish. |
致命
[准确性]
"Quickly adopt your little crawfish."
理由: Complete failure to recognize '小龙虾' (little crayfish) as the slang nickname for the software 'OpenClaw'. Translating it literally makes the sentence nonsensical in a tech context. | Hypothesis is incomplete - missing the content entirely. Also mistranslates '小龙虾' (slang for OpenClaw) as literal 'crawfish' instead of recognizing it as a product reference. | “小龙虾” is community slang for OpenClaw software, not literal crawfish; also omits note about Mac smooth / Windows failure from content. |
| google/translategemma-4b-it | Title: Cherry studio can quickly adopt small lobsters Content: Mac is smoother, Windows failed |
严重
[准确性]
"adopt small lobsters"
理由: Mistranslation of slang: '小龙虾' is a community nickname for the software 'OpenClaw', not the crustacean. | Should be 'adopt/deploy OpenClaw' - 'small lobsters' is a mistranslation of the slang term for OpenClaw | '小龙虾' is community nickname for OpenClaw; should refer to deploying/setting up OpenClaw, not literal lobsters. |
| tencent/HY-MT1.5-1.8B | "en": "The title reads: 'Cherry Studio Can Adopt Lobsters Quickly'.", "zh-TW": "標題是:‘Cherry Studio可以快速收養龍蝦’。”}, "ja": "タイトルはこうです:「Cherry Studioはロブスターを迅速に引き取ることができます」。"}, "ko": "제목은 이렇습니다: ‘Cherry Studio는 로브스터를 빠르게 입양할 수 있습니다’."}, "de": "제목은 다음과 같습니다: ‘Cherry Studio kann Robben schnell adoptieren’.”, "fr": "제목은 다음과 같습니다: ‘Cherry Studio peut adopter les homards rapidement’.”, "it": "제목은 다음과 같습니다: ‘Cherry Studio può adottare i granchi con facilità’."} | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: Cherry Studio Allows Quick Adoption of Lobsters", "zh-TW": "Title: Cherry Studio Allows Quick Adoption of Lobsters", "ja": "Title: Cherry Studio Allows Quick Adoption of Lobsters", "ko": "Title: Cherry Studio Allows Quick Adoption of Lobsters", "de":="Titel: Cherry Studio ermöglicht die schnelle Adoption von Hummern", "fr": "Title: Cherry Studio Permet une Adoption Rapide de Homards", "it": "Title: Cherry Studio Consente all’adozione rapida di aragoste"} |
严重
[准确性]
"Lobsters"
理由: The hypothesis outputs a JSON list of translations for multiple languages instead of a single translation in the target language (zh). It also translates '小龙虾' (OpenClaw slang) literally as 'Lobsters'. | Italian translation uses 'aragoste' (spiny lobsters) instead of the proper term OpenClaw. | “小龙虾” is community slang for OpenClaw, not literal lobsters; this mistranslates the key term and also omits the content about Mac vs Windows performance. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Call for Public Service: Free Installation Help for openclaw/DragonSnail 🤣 Content: Although openclaw is very famous, there are many people hype it up. So I thought of an idea that can help busy people shine and glow: Free installation assistance for openclaw... |
严重
[准确性]
"DragonSnail"
理由: “龙虾”对应OpenClaw/小龙虾,DragonSnail是自造词且完全偏离原项目名称 | Mistranslation of slang. '龙虾' refers to 'OpenClaw', not 'DragonSnail'. | Mistranslation: '小龙虾' was translated as 'DragonSnail' instead of 'OpenClaw'. The reference correctly uses 'OpenClaw'.
严重
[准确性]
"Although openclaw is very famous"
理由: Mistranslation of '史' (slang for 'shit' or 'terrible'). The hypothesis says 'famous', which is the opposite meaning. | Should be 'garbage' instead of 'very famous' to match the reference. | Mistranslation: '很史' (garbage/crappy, internet slang) was mistranslated as 'very famous'. The reference correctly translates this as 'is garbage'. | '很史' is slang meaning 'garbage/terrible', not 'famous'. Incorrect interpretation of slang. | 原文“很史”=很屎/很垃圾,假设译为“very famous”反义,语义相反 |
| CohereLabs/tiny-aya-water | Title: Call for Public Service: Free Installation Help for openclaw / Lóngxiāo 🤣 Content: Although openclaw is very old, but there are many people hype it up. So I thought of a great way to let those who are lazy shine and glow: free installation help for openclaw... |
严重
[准确性]
"very old"
理由: Mistranslation of '史' (slang for 'shit'/garbage) as 'old'. | “很史”为俚语“很屎/很垃圾”,含义为“很烂”,译为“very old”意思错误。 | Mistranslates '很史' as 'very old'. Should be 'garbage' or 'trash' (史 is slang for 垃圾). Reference correctly translates as 'garbage'.
严重
[准确性]
"those who are lazy"
理由: Mistranslation: '很闲的佬们' means 'idle bros' (people with free time who are 'bros'), not 'those who are lazy'. | Mistranslates '闲的佬们' as 'those who are lazy'. Should be 'idle bros' - '闲' means idle/free time, not lazy. | “很闲的佬们”是“很空闲的老哥们”,不是“懒惰的人”,含义反转。 | Mistranslation of '闲' (idle/free time) as 'lazy'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Call for Public Service: Help Install OpenClaw/Shrimp for Free 🤣 Content: Although OpenClaw is quite something, there's a lot of hype around it. So I thought of a way for idle folks to make a difference: offer free installation services for others... |
严重
[准确性]
"quite something"
理由: Should be 'Although OpenClaw is garbage'; '很史' is slang meaning 'garbage/bad', not 'quite something' | Weak translation of '很史' (slang for 'very bad'/'shit'). 'Quite something' is too neutral/positive. Reference uses 'garbage'. | Original '很史' means 'garbage/terrible'; 'quite something' is vague and loses negative sentiment. |
| Qwen/Qwen2.5-7B-Instruct | Title: Call for Public Service: Help Us Install OpenClaw/Shrimp 🤣 Content: Although OpenClaw is great, there are many people hyping it up. So I thought of a way that idle folks can contribute: help others install OpenClaw for free... |
轻微
[准确性]
"is great"
理由: Should be 'OpenClaw is garbage' - 很史 is slang meaning 'garbage/terrible', not 'great' | Severe meaning reversal. '很史' is slang for 'very bad/crap' (derived from 'shit'), but the hypothesis translates it as 'great'. | '很史' is slang for 'garbage/bad', but hypothesis reverses sentiment as 'great'. | Incorrect translation; '很史' means 'is garbage/terrible' not 'is great' |
| Qwen/Qwen3-14B | Richiamo al servizio pubblico: Aiuta ad installare openclaw/gambo di aragosta gratuitamente 🤣 |
轻微
[准确性]
"gambo di aragosta"
理由: Misinterprets “龙虾” as “lobster leg”; here it stands for OpenClaw or ‘lobster’ slang, not a body part. | Wrong language: hypothesis is in Italian but target language is Chinese. 'Gambo di aragosta' (lobster tail) is incorrect - should be 'OpenClaw' or '小龙虾'. | Hypothesis is in Italian, not Chinese. '龙虾' is a slang term for OpenClaw tool, not literal lobster/crab. Complete language mismatch. | Mistranslation of '龙虾' (OpenClaw) as literal 'lobster leg'. Also missed the slang '史' (garbage/crap) in the content (though content wasn't fully translated in hypothesis, the title error is critical). |
| Qwen/Qwen3.5-4B | Title: Call for Public Service: Help you install openclaw/Lobster 🤣 Content: Although openclaw is very old, there are many people hyping it up. So I thought of an idea that idle folks can shine and contribute: Free help others install openclaw... |
严重
[准确性]
"very old"
理由: Mistranslation of slang. '史' is internet slang for 'shit' (garbage/terrible), derived from the pronunciation of '屎'. It does not mean 'old'. | Should be 'garbage' | Should be 'OpenClaw is garbage' - '很史' means garbage/bad, not 'very old' | "很史" is internet slang meaning "very trash/garbage", not "very old". | Incorrect - reference says 'garbage' (史), not 'very old' |
| Qwen/Qwen3.5-9B | Title: Call for Public Service: Help you install openclaw/Lobster 🤣 Content: Although openclaw is very old, many people hype it up. So I thought of an idea where idle folks can shine: Free installation of openclaw for others... |
严重
[准确性]
"very old"
理由: Source "很史" is slang for "very trash/garbage", not "very old". | Misses the slang meaning. '史' means 'garbage/terrible', not 'old'. Reference correctly translates as 'garbage'. | '很史' is internet slang meaning 'garbage/trash', not 'very old' - it's a homophone for '屎' (shit) | Mistranslation of slang '史'. In this context, '史' (shit) means 'garbage' or 'bad quality', not 'old' (which would be '老'). |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Call to public service: Free installation of openclaw/lobster 🤣 |
严重
[准确性]
"openclaw/lobster"
理由: Mistranslation of slang. '龙虾' refers to 'OpenClaw'. 'Lobster' is the literal animal, which is incorrect in this technical context. | Mistranslation: '龙虾' (OpenClaw) is translated as 'lobster' instead of using the proper name 'OpenClaw'. | "龙虾" here refers to the OpenClaw project, not literal lobster; should clarify as OpenClaw to match reference usage. | '龙虾' should be 'OpenClaw', not 'lobster'. Missing content section entirely which contains important context about the service. |
| google/gemma-3-12b-it | Title: Call for Public Service: Help you install openclaw/Lobster🤣 Content: Although openclaw is very historical, there are many people hyping it up. So I thought of a way for idle folks to shine: Free help others install openclaw... |
严重
[准确性]
"very historical"
理由: “很史” is internet slang for “very trash/garbage”; “very historical” is wrong and misleading. | Incorrect; '很史' is slang meaning 'garbage/trash', not 'historical' | Mistranslation: '很史' is internet slang meaning 'garbage/trash', not 'historical' | Misinterpretation of '史' (slang for 'shit' or 'terrible') as 'historical'.
轻微
[风格]
"idle folks"
理由: Loses in-group slang nuance of “很闲的佬们” (idle bros), though message is understandable. | Misses the community slang '佬们' (bros/experts) tone. | Word choice: 'idle folks' vs reference 'idle bros' - loses cultural nuance |
| google/gemma-3-1b-it | Call to Action: Public Service: Help you install openclaw/dragonfish for free 😂 |
严重
[术语]
"dragonfish"
理由: '小龙虾' is OpenClaw, not 'dragonfish' - incorrect terminology | Incorrect translation of '龙虾'. It should be 'OpenClaw' or 'lobster/crawfish' in the slang context, not 'dragonfish'. | Invents “dragonfish”; the slang refers to OpenClaw/‘小龙虾’, not a different product. | 'dragonfish' is incorrect - should be 'shrimp' or just 'OpenClaw'. Also 'Call to Action:' is not in the source and adds unnecessary formatting. |
| google/gemma-3-4b-it | Call to Action for Public Service: Help Install openclaw/Lobster (LinuxDo) Free! |
轻微
[准确性]
"Lobster (LinuxDo)"
理由: Unnecessary addition not present in reference. | Adds unexplained '(LinuxDo)' annotation not in source. Also omits content entirely. | "龙虾" is slang for OpenClaw; adding "(LinuxDo)" is an explanatory addition not grounded in the source but roughly consistent with community context; minor over-specification. |
| google/translategemma-12b-it | Title: Promoting Public Service: Helping you install openclaw/lobster (lol) | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Community Service for openclaw/Lobster 😂 Content: Although openclaw is great, there are many people promoting it. So I thought of a way that idle guys can shine: Free installation of openclaw for others... |
严重
[准确性]
"Although openclaw is great"
理由: Source says 'openclaw很史' (very bad/garbage); hypothesis inverts sentiment to 'great'. | Mistranslation: '很史' (slang meaning 'garbage/trash') was translated as 'great' instead of 'garbage' as in reference. | Should be 'Although OpenClaw is garbage' - the opposite meaning is conveyed | Opposite meaning: The source '很史' is slang for 'very bad' or 'trash' (derived from 'shit'), but the hypothesis translates it as 'great'.
轻微
[术语]
"Lobster"
理由: Loses the product nickname relationship between 'openclaw' and '小龙虾/龙虾' and mixes capitalization. | Mistranslation: '小龙虾' was translated as 'Lobster' instead of 'OpenClaw' as in reference. Also '佬' (bro) was translated as 'guys' instead of 'bros'. | The term 'Lobster' is used instead of 'OpenClaw.' |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '号召公益服务:帮你免费安装openclaw/龙虾🤣'", "zh-TW": "标题:呼籲公益服務:免費幫你安裝 openclaw/龍蝦🤣", "ja": "タイトル:公益サービスを呼びかけます:無料で openclaw/龍蝦をインストールします🤣", "ko": "제목: 공익 서비스를 호소합니다: 무료로 openclaw/룡새를 설치해 드립니다🤣", "de": "Titel: Gemeinschaftsprojekte fordern: Kostenlos installieren von OpenClaw/Rakken 🤣", "fr": "Titre : Appel à la communauté : Installation gratuite d'OpenClaw/Rakken 🤣", "it": "Titolo: Chiamata alla comunità: Installazione gratuita di OpenClaw/Rakken 🤣" } | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: Call for Public Welfare Services: Help You Install openclaw/龙虾 for Free 🤣", "zh-TW": "Title: 呼籲公益服務:幫你免費安裝openclaw/龙虾🤣", "ja": "Title: 公益服務呼籲:幫助您免費安裝openclaw/龙虾🤣", "ko": "Title: 公益服務呼籲:幫助您免費安裝openclaw/龙虾🤣", "de":="Titel: Aufruf zu gemeinnützigen Dienstleistungen: Hilfen Sie anderen, openclaw/龙虾 kostenlos zu installieren 🤣", "fr": "Titre : Appel aux services publics bénéfiques : Aidez les autres à installer openclaw/龙虾 gratuitement 🤣", "it": "Titolo: Appello ai servizi di pubblica utilità: Aiutate gli altri ad installare openclaw/龙虾 gratuitamente 🤣"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Cometix Codex | Like CC's Codex |
严重
[准确性]
"Like CC's Codex"
理由: 参考将CC明确为Claude Code,假设未展开,轻微术语缺失 | Mistranslation: 'CC' stands for 'Claude Code', not 'CC'. The reference correctly translates this as 'A Codex that feels like Claude Code'. | Missing translation of the content section. |
| CohereLabs/tiny-aya-water | Title: Cometix Codex | Like CC's Codex | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Cometix Codex | Like CC's Codex Content: Lightweight coding agent that runs in your terminal From very good CXLine? Continue discussion: | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Cometix Codex | Like CC, but for Coding Content: Lightweight coding agent that runs in your terminal If you liked CXLine, keep discussing: |
严重
[准确性]
"Like CC, but for Coding"
理由: Incorrect translation; should be 'A Codex that feels like Claude Code' | Source implies 'feels like Claude Code' specifically; 'CC' is ambiguous and weakens that reference. | Misinterpretation of '像是 CC 的 Codex' (A Codex that is like CC). The hypothesis changes the meaning to imply CC is not for coding. | Should be 'A Codex that feels like Claude Code' - misses the point that it's comparing Codex tools
严重
[准确性]
"If you liked CXLine, keep discussing"
理由: Mistranslation of '从 很好 CXLine?继续讨论' (Continuing the discussion from 'Very good CXLine?'). The hypothesis invents a conditional 'If you liked'. | Misreads '从 很好 CXLine? 继续讨论:' which means 'continuing the discussion from “Very good CXLine?”', not a conditional about liking CXLine. | Missing context; should be 'Continuing the discussion from Very good CXLine?' | Should be 'Continuing the discussion from 'Very good CXLine?':' - misinterprets the source |
| Qwen/Qwen3-14B | Cometix Codex | Come un Codex CC | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Cometix Codex | CC's Codex Content: A lightweight coding agent that runs in your terminal. From very good CXLine? Continue discussion: |
严重
[准确性]
"CC's Codex"
理由: Loss of nuance. '像是' means 'looks like' or 'feels like'. 'CC's Codex' implies possession, whereas the source implies similarity. | Should be 'A Codex that feels like Claude Code' - CC stands for Claude Code, not just 'CC' | Should be 'feels like Claude Code' | Source means "a Codex that feels like Claude Code (CC)", not that it is literally CC's Codex. | Should be 'A Codex that feels like Claude Code' - missing the nuance |
| Qwen/Qwen3.5-9B | Title: Cometix Codex | Like CC's Codex Content: A lightweight coding agent that runs in your terminal from CXLine? Continue discussion: |
严重
[准确性]
"from CXLine?"
理由: Source implies "continuing the discussion from '很CXLine?'"; hypothesis drops "very good" nuance and slightly alters thread reference. | Syntactic error. The source '从 很好 CXLine?' indicates continuing a discussion *from* a thread titled 'Very good CXLine?', not that the agent is *from* CXLine. | Incorrect; should be 'Continuing the discussion from' - source has '从' meaning 'from' but context is continuing discussion, not starting from a location | Fails to interpret the source correctly. Should be 'Continuing the discussion from Very good CXLine?' as in reference. The source text is garbled but reference provides better interpretation. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Cometix Codex | Like CC's Codex |
严重
[准确性]
"Like CC's Codex"
理由: The source '像是 CC 的 Codex' implies 'A Codex that feels like CC (Claude Code)'. 'CC's Codex' implies possession, which might be slightly different, but acceptable. However, the reference clarifies 'feels like'. | Mistranslation: 'CC' refers to 'Claude Code', not 'CC'. The reference correctly shows 'Claude Code'. | "CC" refers to Claude Code, not just "CC"; translating as "CC's Codex" omits the clear connection to Claude Code provided in the reference. | 'CC' should be 'Claude Code' for clarity. Missing content section which is garbled in source but should attempt translation. |
| google/gemma-3-12b-it | Title: Cometix Codex | Like a Codex for CC Content: Lightweight coding agent that runs in your terminal From Good CXLine? Continue discussion: |
严重
[准确性]
"From Good CXLine?"
理由: Incorrect; should be 'Continuing the discussion from 'Very good CXLine?':' not 'From Good CXLine?' | Source means “Continuing the discussion from ‘Very good CXLine?’”; hypothesis is ungrammatical and obscures that this is a follow-up to a thread title. | Incorrect translation: '从很好 CXLine?' should be 'Continuing the discussion from Very good CXLine?' |
| google/gemma-3-1b-it | Cometix Codex is a lightweight coding agent that runs in your terminal. Is CXLine good enough to continue discussing? |
严重
[准确性]
"Is CXLine good enough to continue discussing?"
理由: Mistranslation of '从 很好 CXLine?' - should be 'Continuing the discussion from Very good CXLine?' The hypothesis incorrectly interprets the meaning. | The hypothesis misses some context present in the reference. | Misinterprets “从 很好 CXLine? 继续讨论:” which indicates continuation from a prior discussion titled “Very good CXLine?”, not a quality question in this sentence. | Mistranslation of '从 很好 CXLine?继续讨论:'. The source implies continuing a discussion *from* a previous thread titled 'Very good CXLine?', not asking if it is good enough to discuss. |
| google/gemma-3-4b-it | Title: Cometix Codex | Like a Codex for CC Content: Lightweight coding agent that runs in your terminal from good CXLine? Continue discussing: |
轻微
[准确性]
"Like a Codex for CC"
理由: Mistranslates 'CC的Codex' as 'Codex for CC' when it should be 'A Codex that feels like Claude Code'. Also 'from good CXLine?' is incoherent. | Reference clarifies CC as Claude Code; leaving it as "CC" without expansion may be ambiguous but generally acceptable. | Should be 'A Codex that feels like Claude Code' as per reference. | Incorrect translation: 'Like a Codex for CC' should be 'A Codex that feels like Claude Code'. The source says 像是CC的Codex meaning 'like Claude Code's Codex'. |
| google/translategemma-12b-it | Title: Cometix Codex | A Codex similar to CC | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Cometix Codex | Like a Codex for CC |
轻微
[准确性]
"Like a Codex for CC"
理由: Should be 'A Codex that feels like Claude Code' - the relationship is reversed | The translation could be more precise. | Ambiguity: 'CC' is an abbreviation that should ideally be expanded to 'Claude Code' for clarity, as done in the reference. |
| tencent/HY-MT1.5-1.8B | Title: Cometix Codex | CC-like Codex Content: A lightweight coding agent that runs in your terminal 从 很好 CXLine? 继续讨论: | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Title: Cometix Codex | A Codex Like CC | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: GPT Team Slot 2 |
严重
[准确性]
"GPT Team Slot 2"
理由: 原意为“2个GPT Team车位/共享席位”,应为"2 GPT Team slots"或"two slots";当前表达易被理解为“第2号车位/单数” | Missing translation of the content section. | Mistranslation: '车位' in this context means 'shared subscription seats/slots', not just 'Slot'. The reference correctly uses 'shared subscription seats'. | Missing context and clarity. Should be '2 GPT Team shared subscription seats' to convey full meaning. |
| CohereLabs/tiny-aya-water | Title: GPT Team Slot 2 |
严重
[准确性]
"GPT Team Slot 2"
理由: Incomplete and unclear. Should be '2 GPT Team shared subscription seats' to clarify what '车位' (seat/slot) refers to in this context. | 应为“2个GPT Team共享名额/席位”,当前表达易被理解为“第二个槽位”,数量关系不清。 | Unnatural phrasing and failure to convey 'shared subscription seats' implied by '车位'. Missing translation of Content. | Should be '2 GPT Team shared subscription seats'. |
| Qwen/Qwen2.5-14B-Instruct | Title: GPT Team Subscription Slots 2 Content: Please send me your email via private message after winning. Prize Details: ChatGPT Team Subscription Slots x 2... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: gpt team slot 2 Content: Please PM me your email after winning. Prize details: ChatGPT Team slot * 2... |
轻微
[术语]
"slot"
理由: While 'slot' is understandable, 'shared subscription seat' (as in reference) is more precise for '车位' in this context. | Should be '2 GPT Team shared subscription seats' | Should be 'shared subscription seats' or 'shared seats' - more accurate term |
| Qwen/Qwen3-14B | Title: gpt team slot 2 Content: After winning, please send me your email via private message. Prize details: ChatGPT Team slot * 2... |
轻微
[术语]
"gpt team slot 2"
理由: '车位' in this context is slang for a shared subscription seat/spot. 'Slot' is acceptable but 'seat' or 'spot' fits the sharing economy slang better. | The term 'slot' is used instead of the more precise 'shared subscription seats'. | Title should be '2 GPT Team shared subscription seats' not 'gpt team slot 2'. 'Slot' is less precise than 'shared subscription seats'. |
| Qwen/Qwen3.5-4B | Title: gpt team shared subscription slots x2 Content: After winning, please DM me your email. Prize details: ChatGPT Team shared subscription slots x2... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: 2 GPT Team shared subscription slots Content: Please DM me your email after winning. Prize details: 2 ChatGPT Team shared subscription slots... |
轻微
[术语]
"2 GPT Team shared subscription slots"
理由: Minor variation. Reference uses 'seats' which is slightly more idiomatic than 'slots'. | Acceptable, but 'seats' is the more common term for '车位' in the context of shared accounts (as per reference). | Reference uses 'seats', hypothesis uses 'slots' - both acceptable but inconsistent with reference |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | gpt team shared subscription slot 2 |
轻微
[流畅性]
"gpt team shared subscription slot 2"
理由: Should be '2 GPT Team shared subscription seats' for better fluency. Missing content section entirely. | Word order is unnatural in English; should be "2 GPT Team shared subscription slots" for fluent expression. | Unnatural word order. Should be '2 GPT Team shared subscription slots'. |
| google/gemma-3-12b-it | Title: gpt team slots 2 |
轻微
[术语]
"slots"
理由: Mistranslation: '车位' means shared subscription seat, not 'slots' | Should be '2 GPT Team shared subscription seats' for clarity and completeness | “车位” in this context is shared subscription seats; “slots” is close but slightly vague without “shared seats/subscription” nuance.
严重
[准确性]
"gpt team slots 2"
理由: Incomplete translation: missing Content section entirely, only title translated | Missing content; only partial title provided, no content translation included | The hypothesis omits the entire content section of the source text. |
| google/gemma-3-1b-it | ChatGPT Team car slots 2 |
严重
[术语]
"car slots"
理由: Literal translation of '车位'. In this context, it refers to 'shared subscription seats' or 'spots', not physical car parking slots. | '车位' is slang for shared subscription seats, not literal car slots | Literal rendering of “车位”; intended meaning is shared subscription seats/slots for GPT Team. |
| google/gemma-3-4b-it | Title: gpt team slots 2 | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: 2 ChatGPT Team Slots Content: After winning, please send me an email via private message with your email address. Prize details: 2 ChatGPT Team Slots... |
轻微
[术语]
"2 ChatGPT Team Slots"
理由: Should be 'GPT Team shared seats' not 'ChatGPT Team Slots' | While 'slots' is understandable, 'shared subscription seats' (as in reference) is more precise for '车位' in the context of group buying accounts. | Misses nuance of ‘shared subscription seats’; ‘slots’ is understandable but less precise. |
| google/translategemma-4b-it | Title: gpt team slots 2 Content: After winning, please private message me and send your email. Prize details: ChatGPT Team slots * 2... |
轻微
[流畅性]
"gpt team slots 2 Content:"
理由: Title and content are run together without clear separation; punctuation and casing are awkward. | The term 'slots' lacks clarity compared to 'shared subscription seats.' | Should be '2 GPT Team shared subscription seats' - word order and terminology are awkward | Formatting error: The title and content are merged into a single string without proper separation. |
| tencent/HY-MT1.5-1.8B | Title: GPT Team Carriage 2 Units |
严重
[术语]
"GPT Team Carriage 2 Units"
理由: Mistranslates 车位 as carriage/units; in this context it means shared subscription seats or slots. Also omits that these are two GPT Team seats (giveaway context). | '车位' is slang for shared subscription seats (like a car seat). The reference correctly translates this as 'shared subscription seats', but the hypothesis incorrectly translates it as 'Carriage'. | Mistranslation of slang '车位'. In this context, it means 'shared subscription seats' or 'spots', not physical 'carriage' or 'units'. The content is also missing. |
| tencent/HY-MT1.5-7B | Title: 2 GPT Team Parking Spaces |
严重
[准确性]
"Parking Spaces"
理由: "车位" in this context is slang for subscription seats/slots, not literal parking spaces; key meaning of giveaway seats is distorted. | Mistranslation of '车位'. In this context, it refers to 'shared subscription seats' or 'spots', not literal 'parking spaces' for cars. | The term 'shared subscription seats' is more accurate. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: This TG group is full of people who are not online and have no response at all, and all nodes are always timed out! |
轻微
[准确性]
"This TG group is full of people who are not online"
理由: The title is missing in the hypothesis. | Missing title entirely. Hypothesis only provides content without the title translation. | Mistranslation of '全员禁言' (all members muted), not 'people not online'. | 原文“TG群全员禁言”=muted for all members,并非“所有人都不在线”;含义被改变 |
| CohereLabs/tiny-aya-water | Title: The cloud ladder is not as good as I thought |
严重
[准确性]
"The cloud ladder is not as good as I thought"
理由: 丢失“慎重购买”“全部超时”“不如几块钱的机场稳定”等关键信息,语义过度概括且减弱负面评价。 | Should include 'VPN' and '慎重购买'. | Complete failure to translate the specific warning about 'Qingyunti' (brand name) and the specific issues (timeout, stability). 'Cloud ladder' is a literal and incorrect translation of the brand/product type in this context. | Completely fails to translate the title. Source is '青云梯梯子慎重购买,全部超时,甚至不如几块钱的机场稳定' which means 'Be cautious when buying Qingyunti VPN, everything times out, it's not even as stable as a proxy provider that costs a few bucks'. Hypothesis is entirely inaccurate. | Mistranslation: '青云梯' is a specific VPN service name (Qingyunti), not a generic 'cloud ladder'. Should be transliterated or kept as 'Qingyunti'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Purchase Qingyun Ladder with Caution, All Timeouts, Even Less Stable Than a Few Yuan Proxy Provider Content: This TG group has banned all members from speaking, and online customer service is not responding! All nodes have timed out! |
严重
[术语]
"Qingyun Ladder"
理由: Should be 'Qingyunti VPN' not 'Qingyun Ladder'; '梯梯子' refers to VPN service, not a ladder | '青云梯' is a VPN/airport brand; leaving it as 'Qingyunti VPN' or 'Qingyunti' is more accurate than literal 'Ladder' which may confuse readers. | Should be 'Qingyunti VPN' for better precision |
| Qwen/Qwen2.5-7B-Instruct | Title: Be cautious when purchasing nodes from TG, all timed out, even less stable than cheap proxy services Content: This TG group has everyone muted, online customer service never replies and is offline, and all nodes are timed out! |
严重
[准确性]
"purchasing nodes from TG"
理由: Mistranslation of '青云梯梯子' (Qingyunti VPN/service) as 'nodes from TG'. TG refers to the Telegram group mentioned later, not the service name. | Should be 'Qingyunti VPN' - the source refers to a specific service name, not generic nodes | Incorrect translation; should be 'Qingyunti VPN' (the service name) | Source warns about buying '青云梯梯子' (a specific VPN/proxy), not 'nodes from TG'; TG group is only their support channel. |
| Qwen/Qwen3-14B | Title: Be cautious when purchasing Qingyunti ladder, all timeouts, even less stable than a few yuan airport Content: All members of this TG group are muted, online customer service never replies and is offline, and all nodes timeout! |
严重
[术语]
"ladder"
理由: Literal translation of '梯子' (ladder). In this context, it refers to a VPN or proxy service. 'Ladder' is confusing in English. | “梯子/机场” in this context refers to VPN/proxy services; “ladder” and “airport” literally are misleading for target users without the slang context. | 'ladder' should be 'VPN' based on reference. '梯子' is slang for VPN proxy service, not literal ladder. Also 'airport' should be 'proxy provider'. |
| Qwen/Qwen3.5-4B | Title: Proceed with caution when purchasing Cloud Ladder. All connections timeout, even less stable than cheap airports. Content: This TG group has all members muted. Customer support never replies and is offline. Furthermore, all nodes have timed out! |
严重
[准确性]
"Cloud Ladder"
理由: Mistranslation of proper noun. '青云梯' (Qingyunti) is a specific brand name for a VPN/proxy service. Translating it literally as 'Cloud Ladder' loses the brand identity. | Should be 'Qingyunti VPN' - direct translation of the service name, not 'Cloud Ladder' | "青云梯" is a specific VPN/proxy brand; leaving it transliterated (Qingyunti) or as a VPN provider name would be more accurate than generic "Cloud Ladder". |
| Qwen/Qwen3.5-9B | Title: Buy Qingyuntide proxies with caution; all nodes timeout, even less stable than cheap VPN providers. Content: This TG group has everyone muted, customer support never replies and is offline, and all nodes time out! |
严重
[术语]
"Qingyuntide"
理由: Transliteration choice. 'Qingyunti' is the standard pinyin for the brand, 'Qingyuntide' adds an unnecessary English meaning 'tide'. | 'Qingyuntide' is pinyin; should be 'Qingyunti' (青云梯) or 'Qingyunti VPN' | Source "青云梯" is a specific VPN/ladder brand, not "Qingyuntide"; mistranscribed brand name. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Be cautious when purchasing Qunxing ladder, all time limits have expired, and even less stable than a few dollars airport. |
严重
[准确性]
"Qunxing ladder"
理由: Mistranslation: '青云梯' (Qingyunti, a VPN service) is incorrectly transliterated as 'Qunxing'. | Mistranslation. '青云梯' is a proper noun (brand name 'Qingyunti'). 'Ladder' is a literal translation of '梯' (often used for VPNs), but combining it with a wrong pinyin 'Qunxing' instead of 'Qingyun' is an error. | '青云梯' is 'Qingyunti' (service name), not 'Qunxing ladder'. '超时' means 'times out', not 'time limits have expired'. Missing content section. | Misreads "青云梯" as "Qunxing"; should transliterate or interpret as "Qingyunti" VPN/ladder per reference.
严重
[准确性]
"all time limits have expired"
理由: Mistranslation of '超时'. In network context, it means 'time out' (connection failure), not that a 'time limit expired'. | "全部超时" means all connections/nodes time out, not that purchase time limits expired. | Incorrect translation: '超时' means 'time out' (as in connection timeout), not 'time limits expired'.
轻微
[术语]
"a few dollars airport"
理由: Incorrect term: '机场' (proxy provider/VPN service) should be 'proxy provider' or 'VPN', not 'airport'. | "机场" in this context is a proxy/VPN provider; literal "airport" is misleading in English and should be rendered as proxy provider or similar. | '机场' is slang for 'proxy provider/subscription service'. Translating as 'airport' is confusing without context, though common in literal translations. 'Proxy provider' is clearer. |
| google/gemma-3-12b-it | Carefully consider purchasing Qingyun Ti Ladder; all connections timed out and were even less stable than a few dollar proxy providers. |
严重
[准确性]
"Carefully consider purchasing Qingyun Ti Ladder"
理由: Mistranslation: '青云梯' is a VPN service name, should be transliterated as 'Qingyunti' not translated as 'Ladder' | Missing content translation; only title provided, source includes full content about customer service and node issues | The hypothesis omits the content section entirely. Also, 'Ladder' is a literal translation of '梯子' (VPN/Proxy), which is often better localized as VPN or proxy service. |
| google/gemma-3-1b-it | This group is completely banned, and customer service has been unresponsive, and all nodes have timed out! |
严重
[准确性]
"This group is completely banned"
理由: Mistranslation of '全员禁言'. This means 'all members are muted' (cannot speak), not that the group itself is banned/shut down. | The hypothesis misses some details present in the reference. | Hypothesis is translation of content only, missing title translation | “全员禁言” means the TG group is fully muted, not banned; changes the meaning. | Mistranslation of '禁言' - should be 'muted' not 'banned'. Also misses context that this is about a TG group specifically. |
| google/gemma-3-4b-it | Title: Be cautious when purchasing Queyun Ladder, all sessions timeout, even less stable than a few dollars proxy provider. |
严重
[术语]
"Queyun Ladder"
理由: Mis-transcribes/romanizes brand name "青云梯" as "Queyun" instead of the standard "Qingyunti"; may hinder recognizability. | Incorrectly translates 青云梯 as 'Queyun Ladder' when it should be 'Qingyunti' (transliteration) or 'Qingyunti VPN' (as in reference). Also '机场' should be 'proxy provider' or 'VPN subscription', not 'proxy provider' (redundant). | '青云梯' is a brand name. 'Qingyunti' (pinyin) or keeping the brand name is preferred over translating '梯' to 'Ladder' which sounds generic. |
| google/translategemma-12b-it | Exercise caution when purchasing from Qingyun Ti ladder services; all connections time out, and they are even less stable than inexpensive proxy providers. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Be cautious when purchasing Qingyun Ladder, all nodes timeout, even less stable than a few dollars of airport. |
严重
[术语]
"Qingyun Ladder"
理由: Mistranslation: '梯子' (VPN) was translated as 'Ladder' instead of 'VPN' as in reference. Also '机场' (proxy provider/service) was translated as 'airport' which is incorrect - should be 'proxy provider' as in reference. | Unnatural translation: '梯子' in this context refers to a VPN or proxy service. 'Ladder' is a literal translation that sounds unnatural in English. | Should be 'Qingyunti VPN' - incorrect product name translation |
| tencent/HY-MT1.5-1.8B | Translation of '青云梯梯子慎重购买,全部超时,甚至不如几块钱的机场稳定' in en |
致命
[准确性]
"Translation of '青云梯梯子慎重购买,全部超时,甚至不如几块钱的机场稳定' in en"
理由: The translation lacks context and clarity compared to the reference. | No actual translation is given; only a meta description. It fails to express caution about buying the Qingyunti VPN, total timeouts, and comparison to cheap proxy providers. | The hypothesis contains no translation, only a placeholder string. The content is also missing. |
| tencent/HY-MT1.5-7B | "en": "Title: Be cautious when purchasing Qingyun Ladder; all connections time out, even less stable than a few-dollar airport VPN services.", "zh-TW": "标题:购买青云梯时要谨慎;所有连接都会超时,甚至比几块钱的机场VPN服务还不稳定!", "ja": "タイトル:青雲梯を購入する際には注意が必要です。すべての接続がタイムアウトし、数元の机场VPNサービスよりも安定性が低いです!", "ko": "제목: 청운사다리를 구매할 때 주의가 필요합니다. 모든 연결이 타임아웃되며, 몇 원짜리 공항VPN 서비스보다도 덜 안정적입니다!", "de":="Titel: Seien Sie vorsichtig beim Kauf von Qingyun Ladder – alle Verbindungen gehen aus dem Zeitraum und sind sogar weniger stabil als einige wenige Euro teure Airport-VPN-Dienste.", "fr":="Titre : Soyez vigilant lors de l’achat de Qingyun Ladder ; toutes les connexions tombent en panne et sont même moins stables que des services VPN d’aéroport pour quelques euros seulement.", "it":="Titolo: Fate attenzione quando acquistate Qingyun Ladder; tutte le connessioni scadono e sono persino meno stabili di servizi VPN di aeroporto che costano pochi euro."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Let's talk about what you're all doing with small shrimp |
严重
[准确性]
"small shrimp"
理由: “小龙虾”是OpenClaw的昵称,与普通“小虾”不同;应保留OpenClaw或小龙虾(OpenClaw) | Mistranslation: '小龙虾' is the Chinese internet slang for 'OpenClaw', not literal 'small shrimp'. The reference correctly uses 'OpenClaw'. | Mistranslation of slang. '小龙虾' refers to the tool 'OpenClaw', not literal shrimp. | '小龙虾' is slang for 'OpenClaw', not literal 'small shrimp'. Should translate as 'OpenClaw'. |
| CohereLabs/tiny-aya-water | Title: Let's talk about what you're all doing with small shrimp |
致命
[准确性]
"Let's talk about what you're all doing with small shrimp"
理由: Mistranslation: '小龙虾' is slang for 'OpenClaw', not literal 'small shrimp'. This loses the technical meaning. | Literal translation of slang '小龙虾' (OpenClaw). Loses all technical meaning. | '小龙虾' is slang for OpenClaw in this community, not literal shrimp. Should be 'Tell me, what do you all use OpenClaw for?' |
| Qwen/Qwen2.5-14B-Instruct | Title: Tell me what you use little lobsters for Content: Little lobster is installed but I don't know what to do with it | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: What do you use crayfish for? Content: Crayfish installation complete, not sure what to do with them. |
致命
[准确性]
"crayfish"
理由: Mistranslation of '小龙虾' (OpenClaw software) as literal 'crayfish'. | Should be 'OpenClaw' - 小龙虾 is slang for OpenClaw, not literal crayfish | '小龙虾' here is community slang for OpenClaw software, not actual crayfish. |
| Qwen/Qwen3-14B | Diteci cosa fate con i gamberetti rossi |
轻微
[准确性]
"gamberetti rossi"
理由: Misreads “小龙虾” which here stands for OpenClaw or ‘lobster’ agent, not literal red shrimp; also loses product name. | Mistranslation of '小龙虾' (OpenClaw software) as literal 'red shrimp'. | Hypothesis is in Italian, not Chinese. '小龙虾' is slang for OpenClaw tool, not literal shrimp. Complete language mismatch. | Wrong language: hypothesis is in Italian but target language is Chinese. 'Gamberetti rossi' (red shrimp) is incorrect - should be 'OpenClaw' or '小龙虾'. |
| Qwen/Qwen3.5-4B | Title: Tell us what you do with your crawfish Content: Crawfish installed but don't know what to do with them |
严重
[准确性]
"crawfish"
理由: Should be 'I finished installing OpenClaw but don't know what to do with it' - subject should be first person | Failure to translate slang. '小龙虾' refers to 'OpenClaw' software. Literal translation makes the text nonsensical. | "小龙虾" refers to OpenClaw software, not actual crawfish. |
| Qwen/Qwen3.5-9B | Title: Tell us what you use crayfish for Content: Crayfish installed and don't know what to do with them |
严重
[术语]
"crayfish"
理由: 'Crayfish' is literal translation; should be 'OpenClaw' - '小龙虾' is slang for OpenClaw | Here "小龙虾" refers to OpenClaw software, not literal crayfish animals. | Should be 'OpenClaw'
致命
[准确性]
"Crayfish installed and don't know what to do with them"
理由: Plural "them" suggests multiple animals; original is a single installed software instance (OpenClaw). | Literal mistranslation of software slang. '小龙虾' refers to 'OpenClaw', not the crustacean. | Same issue - should reference OpenClaw, not literal crayfish. Reference correctly uses 'OpenClaw'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | What are you all doing with small shrimp?! |
严重
[准确性]
"small shrimp"
理由: "小龙虾" here is a nickname for OpenClaw software, not literal shrimp; loses intended meaning. | Mistranslation of slang. '小龙虾' refers to the AI tool 'OpenClaw', not the animal. | '小龙虾' is slang for 'OpenClaw', not literal 'small shrimp'. Missing content section entirely. | Mistranslation: '小龙虾' (OpenClaw) is translated literally as 'small shrimp' instead of using the proper name 'OpenClaw'. |
| google/gemma-3-12b-it | Title: Tell us what you use crawfish for Content: Crawfish installation complete, not sure what it can be used for |
严重
[准确性]
"crawfish"
理由: “小龙虾” here refers to OpenClaw project, not actual crawfish; changes domain from software to food/animal. | The term 'crawfish' is incorrect; it should be 'OpenClaw.' | Incorrect; '小龙虾' is slang for 'OpenClaw', not literal 'crawfish' | Literal translation of '小龙虾' (OpenClaw slang) as the animal 'crawfish' loses the technical meaning. | Mistranslation: '小龙虾' refers to OpenClaw software, not the animal |
| google/gemma-3-1b-it | Saying what? What can you do with small shrimp? |
严重
[准确性]
"Saying what?"
理由: Adds inaccurate phrase; source means “Tell me, what do you all use OpenClaw for?” | '小龙虾' is slang for OpenClaw, not literal small shrimp | '小龙虾' is slang for OpenClaw, not literal shrimp. Should be 'Tell me, what do you all use OpenClaw for?' The hypothesis completely misses the context. | The hypothesis only translates the title and not the entire content, making it incomplete. | Mistranslation of '说说看' (Tell me / Let's hear it). 'Saying what?' changes the meaning to a question about current speech. |
| google/gemma-3-4b-it | Let's talk about what you guys use little lobsters for. |
严重
[准确性]
"little lobsters"
理由: Mistranslates '小龙虾' (slang for OpenClaw) as 'little lobsters' instead of 'OpenClaw'. Also omits content entirely. | "小龙虾" here refers to OpenClaw, not literal lobsters; mistranslation loses the software/product meaning. | Should be 'OpenClaw' as per reference. | Omits the content section ('Installation complete, don't know what to do'). |
| google/translategemma-12b-it | Title: Tell us what you use Xiaolongxia for. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Let's talk about what you all use small lobsters for Content: Small lobster installed, don't know what to do |
严重
[准确性]
"small lobsters"
理由: Should be 'OpenClaw' - 'small lobsters' is a mistranslation of the slang term | Mistranslation of slang: '小龙虾' refers to the software 'OpenClaw', not the animal. | '小龙虾' here is nickname for OpenClaw; literal 'small lobsters' loses the software reference. |
| tencent/HY-MT1.5-1.8B | What do you all use lobsters for? | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: Let's talk about what you all use crayfish for.", "zh-TW": "Title: 让我们聊聊你们都用小龙虾来干啥吧!", "ja": "Title: みんなでクラゲを何に使っているのか話しましょう.", "ko": "Title: 함께 이야기해 봅시다. 여러분은 크래피시를 어떤 용도로 사용하고 있나요?", "de":="Titel: Lassen Sie uns besprechen, wofür Sie alle Krabben verwenden.", "fr":="Titre: Parlons de ce que vous utilisez tous avec les crevettes.", "it":="Titolo: Parliamo di cosa usate tutti con i gamberetti."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Tips for Cooking Dragon Prawns 🦞 Content: Here are some tips for cooking Dragon Prawns. Please try them out! |
致命
[准确性]
"Title: Tips for Cooking Dragon Prawns 🦞"
理由: Completely wrong translation. '小龙虾' refers to 'OpenClaw' (software), not cooking. Should be 'How do you all manage memory for OpenClaw?' | Mistranslation: '小龙虾' is the Chinese internet slang for 'OpenClaw', not literal 'Dragon Prawns'. The reference correctly uses 'OpenClaw'. | 完全误解主题:应为OpenClaw记忆管理,而非烹饪食物 | The content is unrelated to the reference. | Complete hallucination/mistranslation. The source asks about 'memory management' for the tool 'OpenClaw', not cooking prawns. |
| CohereLabs/tiny-aya-water | Title: How to manage memory for everyone and little shrimps 🦞 |
轻微
[准确性]
"everyone and little shrimps"
理由: Mistranslation: '小龙虾' is slang for 'OpenClaw', not literal 'little shrimps'. Also 'everyone' is incorrectly added - the source is asking about OpenClaw specifically. | Literal translation of slang '小龙虾' (OpenClaw). | “大家小龙虾”是“大家对OpenClaw”的意思,不是“大家和小虾”,结构和指代均误解。 | Mistranslates '小龙虾' as literal 'little shrimps'. Should be 'How do you all manage memory for OpenClaw?' Also grammatically incorrect. |
| Qwen/Qwen2.5-14B-Instruct | Title: How does everyone make memory management for small lobsters 🦞 Content: I made a memory system but found it not very useful, let's exchange ideas together | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: How to Manage Memory for Giant River Prawns 🦞 Content: Developed a memory system, but found it not very user-friendly. Guys, let's discuss this together. |
致命
[准确性]
"Giant River Prawns 🦞"
理由: Mistranslation of '小龙虾' (OpenClaw) as literal 'Giant River Prawns'. | Should be 'OpenClaw' - 小龙虾 is slang for OpenClaw, not literal prawns | '小龙虾' is slang for OpenClaw software, not actual prawns. |
| Qwen/Qwen3-14B | Come gestite la vostra memoria con il granchio 🦞? |
轻微
[准确性]
"il granchio 🦞"
理由: “小龙虾/龙虾” refers to OpenClaw system, not a crab; incorrect animal and loss of software/product reference. | Mistranslation of '小龙虾' (OpenClaw) as literal 'crab'. | Translation is in Italian instead of English. Source is in Chinese and should be translated to English to match reference language. | Wrong language: hypothesis is in Italian but target language is Chinese. 'Granchio' (crab) is incorrect - should be 'OpenClaw' or '小龙虾'. |
| Qwen/Qwen3.5-4B | Title: How to manage memory for everyone's crayfish 🦞? Content: I built a memory system but found it not very useful yet. Folks, let's discuss together. |
严重
[术语]
"crayfish"
理由: Should be 'OpenClaw' - '小龙虾' is slang for OpenClaw in this context | Incorrect - '小龙虾' is slang for 'OpenClaw', not literal crayfish | Should be 'OpenClaw' |
| Qwen/Qwen3.5-9B | Title: How to manage memory with crayfish 🦞? Content: I built a memory system but found it not very useful. Let's discuss together. |
严重
[术语]
"crayfish"
理由: 'Crayfish' should be 'OpenClaw' - '小龙虾' is slang for OpenClaw | Should be 'OpenClaw' | Again "小龙虾" is OpenClaw, not literal crayfish; terminology mismatch. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: How to Do Memory Management for Little Shrimp Dragon🦞 |
严重
[准确性]
"Little Shrimp Dragon🦞"
理由: Mistranslation: '小龙虾' (OpenClaw) is translated as 'Little Shrimp Dragon' instead of using the proper name 'OpenClaw'. | "小龙虾" in this context means OpenClaw, not a literal shrimp/dragon; translation fails to convey it is about an AI tool’s memory management. | Mistranslation of slang. '小龙虾' is 'OpenClaw'. 'Little Shrimp Dragon' is a confused literal translation. | Mistranslation of '大家小龙虾' - should be translated as a casual greeting/reference to the community or users, not a literal translation of individual characters. The reference shows 'you all' which better captures the intended meaning. |
| google/gemma-3-12b-it | Title: How to Manage Memory for Everyone's Small Crayfish 🦞 Content: I made a memory system, but it still doesn't work very well. Let's exchange ideas together. |
严重
[准确性]
"Everyone's Small Crayfish"
理由: Incorrect: '大家小龙虾' means 'OpenClaw for everyone', not 'Everyone's Small Crayfish' | Mistranslation of '大家小龙虾'. This appears to be a colloquial/slang reference (likely a meme or app name 'OpenClaw'), not a literal translation about crayfish. The hypothesis translates it literally instead of recognizing it as a proper noun or brand name. | Literal translation of '小龙虾' (OpenClaw slang) as the animal 'Small Crayfish' loses the technical meaning. | The term 'Small Crayfish' is incorrect; it should be 'OpenClaw.' | “小龙虾” is OpenClaw project; rendering as literal crayfish loses technical/product meaning. |
| google/gemma-3-1b-it | How to make a memory system, and what is the problem? |
严重
[准确性]
"How to make a memory system, and what is the problem?"
理由: Complete hallucination/mistranslation. The source asks 'How do you manage memory for OpenClaw?' and states the user's system isn't working well. The hypothesis asks unrelated questions. | Incomplete and inaccurate translation. Misses the title entirely and doesn't capture the full meaning. Should include 'How do you all manage memory for OpenClaw?' and the content about building a memory system. | Incorrect translation - '小龙虾' is OpenClaw, hypothesis omits this key term | The hypothesis misses some context present in the reference. | Misses that the topic is how people manage memory for OpenClaw and that the speaker already built a system but finds it not very useful. |
| google/gemma-3-4b-it | How to make crayfish (LinuxDo community) memory management? |
严重
[准确性]
"crayfish (LinuxDo community)"
理由: Should be 'OpenClaw' as per reference. | "小龙虾" is slang for OpenClaw; translating as "crayfish" plus parenthetical community note misrepresents the core referent as an animal rather than the tool. | Mistranslates '小龙虾' as 'crayfish' instead of 'OpenClaw'. Adds unexplained '(LinuxDo community)' annotation. Omits content. |
| google/translategemma-12b-it | Title: How to Manage Memory for Small Lobsters (Crayfish) - Everyone's Guide Content: I created a memory system, but it doesn't seem very effective. Let's share experiences and discuss! | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: How to make everyone's crayfish memory management work? |
严重
[准确性]
"How to make everyone's crayfish memory management work?"
理由: Should be 'How do you all manage memory for OpenClaw?' - incorrect slang translation and awkward phrasing | Mistranslation of slang: '小龙虾' refers to 'OpenClaw', not 'crayfish'. | Misinterprets '大家小龙虾怎么做记忆管理的' which asks how people manage memory for OpenClaw; 'crayfish' is literal and structure is incorrect. |
| tencent/HY-MT1.5-1.8B | Translation of '大家小龙虾🦞怎么做记忆管理的' in en |
致命
[准确性]
"Translation of '大家小龙虾🦞怎么做记忆管理的' in en"
理由: The hypothesis does not provide an actual translation. It only states 'Translation of [source text] in en' without delivering the translated content. The reference shows the proper translation should be 'How do you all manage memory for OpenClaw?' but the hypothesis provides no translation at all. | The hypothesis contains no translation, only a placeholder string. The content is also missing. | The translation lacks context and clarity compared to the reference. | Only meta description; does not provide the actual English translation asking how people manage memory for OpenClaw. | The hypothesis is not a proper translation but a placeholder indicating translation to English. It fails to provide the actual translated content. |
| tencent/HY-MT1.5-7B | Title: How Do We Manage Memory With Small Shrimp? |
严重
[准确性]
"Small Shrimp"
理由: "小龙虾" is slang for OpenClaw agent, not literal shrimp; loses the software reference. | Completely misses the slang meaning. '大家小龙虾' is internet slang referring to 'OpenClaw' (a project/tool), not literal shrimp. The translation should convey the technical context, not interpret it literally. | Mistranslation of '小龙虾'. In this community context, it refers to 'OpenClaw', not the animal 'Small Shrimp'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 0.001倍codex爽蹬 Content: 测试站点:...魔改了一下注册机,搞了5000来个号。佬们顺便帮忙测试一下zeabur的机器,蹬烂即止。 | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: 0.001倍codex爽蹬 Content: Testing site: ... modified the registration machine, got 5000 numbers. Guys please help test zeabur's machine, stop at failure. |
轻微
[准确性]
"stop at failure"
理由: Mistranslates '蹬烂即止' as 'stop at failure'. Should be 'spam it until it breaks' - '蹬烂' means to break/destroy through overuse. | “蹬烂即止”是“刷到它挂掉为止”,当前译法不够地道,未体现“刷爆/踩烂”的含义。 | Mistranslation of '蹬烂即止' (spam until it breaks/crashes). |
| Qwen/Qwen2.5-14B-Instruct | Title: 0.001x codex smooth ride Content: Test site: ... Modified the registration machine a bit and created around 5000 accounts. Please help test zeabur's server, use until it breaks. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: 0.001x codex jump Content: Test site: ...Modified the registration tool a bit, got around 5000 accounts. Guys, help test Zeabur's machines, stop when broken. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | 0.001 volte codex爽蹬 | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: 0.001x codex 爽蹬 Content: Test site: ... Modified the registration machine a bit, created over 5000 accounts. Folks, please help test Zeabur's machines as well; stop when they're worn out. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: 0.001x codex sprinting Content: Test site: ... Modified the registration machine a bit, created over 5000 accounts. Guys, please help test zeabur's machines, keep sprinting until they break. |
严重
[准确性]
"0.001x codex sprinting"
理由: Misses the slang meaning of '爽蹬'. Should be 'Happily spamming Codex at 0.001x rate' as in reference. | Mistranslation of slang '爽蹬'. In this context, it means 'spamming' or 'using aggressively/freely', not physically sprinting. | Should be 'Happily spamming Codex at 0.001x rate' - '爽蹬' means 'happily spamming/using vigorously', not 'sprinting'
严重
[准确性]
"keep sprinting until they break"
理由: Should be 'spam it until it breaks' - '蹬烂' means 'use until it breaks/fails', not physical sprinting | Contextual error. '蹬烂' means 'spam/use until it breaks', not 'sprint'. | Misses the meaning. Should be 'spam it until it breaks' as in reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: 0.001倍codex爽蹬 Content: Testing site: ... modified the registration machine by about 5000 numbers. Guys, please help test zeabur's machine, hit it until it breaks. |
严重
[准确性]
"modified the registration machine by about 5000 numbers"
理由: '号' refers to 'accounts', not 'numbers'. | Incorrect translation: '5000来个号' means 'around 5000 accounts', not '5000 numbers'. | Misinterprets "搞了5000来个号"; should be about creating ~5000 accounts, not modifying the registration machine by 5000 numbers.
严重
[准确性]
"0.001倍codex爽蹬"
理由: Title not translated at all - should be 'Happily spamming Codex at 0.001x rate'. Content translation is incomplete and missing details. | Failure to translate. The title is left in Chinese. | Mistranslation: '爽蹬' (happily spam/abuse) is transliterated as '爽蹬' instead of being translated as 'happily spamming'. |
| google/gemma-3-12b-it | Title: 0.001x Codex Swang Deng Content: Test site: ...Modified the registration machine a bit and created around 5000 accounts. Seniors, please help test zeabur's machine as well, push it until it breaks. |
轻微
[准确性]
"0.001x Codex Swang Deng"
理由: Mistranslation: '爽蹬' is slang meaning 'happily spamming/use aggressively', transliteration is incorrect | The term 'Swang Deng' is incorrect; it should be 'spamming.' | Nonsensical transliteration/translation of '爽蹬' (slang for spamming/using freely). 'Swang Deng' has no meaning in English. | Incorrect; '蹬' means 'spam/push', not 'Swang Deng'; should be 'Happily spamming Codex at 0.001x rate' |
| google/gemma-3-1b-it | Test Site: ...Modified the registration machine, got 5000 accounts. |
轻微
[准确性]
"Test Site: ...Modified the registration machine, got 5000 accounts."
理由: Incomplete translation - only translates part of the content. Misses the title '0.001倍codex爽蹬' and the second part about testing Zeabur machine and spamming. | The hypothesis misses some details present in the reference. | Incomplete translation. Omits the title '0.001 倍 codex 爽蹬' and the call to action in the content '佬们顺便帮忙测试一下 zeabur 的机器,蹬烂即止' (Bros, help test... spam until it breaks). | Omits the request for help testing the Zeabur machine and the idea of spamming until it breaks, which are part of the main intent. |
| google/gemma-3-4b-it | Title: 0.001x codex boost |
严重
[准确性]
"0.001x codex boost"
理由: "爽蹬" means spamming/hammering, not "boost"; loses the idea of abusive/high-volume use of Codex. | Mistranslates '爽蹬' as 'boost' when it means 'happily spamming'. Also omits content entirely. | Should be 'spamming Codex' as per reference. | Incorrect translation: 爽蹬 should be 'happily spamming' or 'enjoying spamming', not 'boost'. The term 蹬 is slang for spamming/testing aggressively. | '爽蹬' is slang for 'happily spamming' or 'using freely/unrestrainedly'. 'Boost' completely changes the meaning to performance enhancement, which is incorrect. |
| google/translategemma-12b-it | Title: 0.001x codex fast registration Content: Test site: ...I modified the registration machine and created about 5000 accounts. Please help test zeabur's machine as well; keep trying until it fails. |
严重
[准确性]
"fast registration"
理由: “爽蹬” refers to spamming/using aggressively, not registration speed. | Mistranslation of '爽蹬'. '爽' implies enjoyment/satisfaction, '蹬' in this context (often associated with 'spamming' or 'hammering' a service) implies aggressive usage. 'Fast registration' misses the nuance of 'spamming happily'. | Mistranslates '爽蹬' as 'fast registration' when it means 'happily spamming'. Reference correctly uses 'Happily spamming'. |
| google/translategemma-4b-it | Title: 0.001x codex - Smooth Ride |
严重
[准确性]
"0.001x codex - Smooth Ride"
理由: Mistranslation of slang: '爽蹬' implies 'happily spamming' or 'using freely/unrestrainedly' in this context, not a 'smooth ride'. | Should be 'Happily spamming Codex at 0.001x rate' - completely misses the meaning and tone | Mistranslation: '爽蹬' (happily spamming) was translated as 'Smooth Ride' instead of 'Happily spamming' as in reference. The term '蹬' is slang for using/testing heavily. | '爽蹬' refers to 'happily spamming/hammering'; 'Smooth Ride' misses the idea of aggressively using/spamming the service. |
| tencent/HY-MT1.5-1.8B | Translation of '0.001倍codex爽蹬' in en |
致命
[准确性]
"Translation of '0.001倍codex爽蹬' in en"
理由: The translation lacks context and clarity compared to the reference. | The hypothesis contains no translation, only a placeholder string. The content is also missing. | The hypothesis is not a proper translation but a placeholder indicating translation to English. It fails to provide the actual translated content. | No actual translation is provided; it should convey the idea of happily spamming/hammering Codex at 0.001x rate. |
| tencent/HY-MT1.5-7B | Title: 0.001 Times Codex Suan Deng |
严重
[准确性]
"0.001 Times Codex"
理由: Fails to translate slang term '蹬' (deng/spam). Should convey 'spamming' or similar meaning. Pinyin transliteration 'Suan Deng' is not a valid translation approach. | The hypothesis fails to translate the 'Content' section entirely. | Reference clarifies meaning as "0.001x rate"; "Times" is ambiguous compared to "x rate" and content about testing with many accounts is omitted. | The translation is incomplete and doesn't convey the meaning of 'happily spamming' or the context of the rate testing. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Am I raising a 213 Crab? |
严重
[准确性]
"213 Crab"
理由: Mistranslation: '213' (a Chinese internet slang meaning 'stupid') was left as numbers, and '龙虾' (OpenClaw) was translated as 'Crab' instead of 'OpenClaw'. The reference correctly uses 'stupid (213/2B) OpenClaw'. | “213龙虾”指“智障/2B的OpenClaw实例”,Crab改变物种且丢失OpenClaw绰号,213也未体现“傻”之含义 | '213龙虾' uses slang '213' (meaning stupid/2B) and '龙虾' (OpenClaw). Should be 'Did I deploy a stupid OpenClaw?' not 'Crab'. | Mistranslation of slang. '龙虾' is 'OpenClaw'. '213' is slang for 'stupid' (er-bai-wu), which is lost or obscure here. | Should be 'stupid' instead of '213 Crab' to match the reference. |
| CohereLabs/tiny-aya-water | Title: Am I raising a 213 Crab? |
严重
[准确性]
"Am I raising a 213 Crab?"
理由: Literal translation of '213' (slang for stupid/2B) and '龙虾' (OpenClaw). Loses the insult and the technical subject. | Mistranslation: '213' (Chinese internet slang meaning 'stupid') was transliterated as a number. Also '龙虾' should be 'OpenClaw', not 'Crab'. The reference correctly shows '213/2B' as the slang term. | Mistranslates '213龙虾' and '龙虾' (shrimp). Should be 'Did I deploy a stupid (213/2B) OpenClaw?' - '213' is slang for stupid/2B, and '龙虾' is slang for OpenClaw. |
| Qwen/Qwen2.5-14B-Instruct | Title: Did I end up raising a 213 lobster? Content: He needs to install a skill but hasn't managed after an hour, only replies if you ask him directly... |
轻微
[准确性]
"He needs to install a skill"
理由: Misinterprets the subject. The user asked the AI ('it') to install a skill, not that 'He' (the lobster) needs to install one for someone else. The reference clarifies 'I asked it...'. | Original is about asking the system to install a skill; anthropomorphizing as 'he' is acceptable but shifts nuance slightly. | Should be 'I asked it to install a skill, and it took an hour without succeeding'; the subject is the AI tool, not a person |
| Qwen/Qwen2.5-7B-Instruct | Title: Am I raising a 213 Lobster? Content: Trying to get him to install a skills skill, spent an hour without getting it installed properly, asked one question and got another reply, didn’t ask and it never replied... |
严重
[准确性]
"213 Lobster"
理由: Should be 'stupid (213/2B) OpenClaw' - 213龙虾 refers to a stupid/2B OpenClaw, not a lobster type | Failed to translate the slang meaning of '213' (stupid/foolish). Translating it as a number confuses the reader. | '213龙虾' refers to 'stupid OpenClaw'; 'Lobster' omits the OpenClaw/software reference and numeric slang meaning.
轻微
[流畅性]
"install a skills skill"
理由: Redundant; should be 'install a skill' | Redundant and awkward; should be 'install a skill' or 'a Skills module'. | Redundant phrasing. |
| Qwen/Qwen3-14B | Sto allevando un granchio 213? |
轻微
[准确性]
"granchio 213"
理由: “213龙虾” is slang for ‘stupid OpenClaw instance’; mistranslated as literal “213 crab” and omits OpenClaw/software aspect. | Hypothesis is in Italian, not Chinese. '213龙虾' refers to a stupid/2B OpenClaw deployment, not literal crab. Complete language and meaning mismatch. | Mistranslation of '龙虾' (OpenClaw) as 'crab'. Also failed to convey the meaning of '213' (slang for stupid/foolish). | Wrong language: hypothesis is in Italian but target language is Chinese. 'Granchio' (crab) is incorrect - should be 'OpenClaw' or '小龙虾'. '213' is a Chinese internet slang (meaning 'stupid') that should be translated or explained. |
| Qwen/Qwen3.5-4B | Title: Did I raise a 213 lobster? Content: He asked me to install a skills skill, and after an hour, it still wasn't installed properly. When you ask him something, he replies; when you don't ask, he never replies... |
严重
[准确性]
"213 lobster"
理由: "龙虾" is OpenClaw software; "213" is slang for stupid. Literal "213 lobster" misleads. | Failure to translate slang and numbers. '213' is slang for 'stupid' (er-bi-san), and 'lobster' is 'OpenClaw'. The literal translation is nonsense. | Should be 'stupid OpenClaw' |
| Qwen/Qwen3.5-9B | Title: Did I raise a 213 lobster? Content: He asked me to install a skills skill, and after an hour it still wasn't installed properly. When he asks something, you get one reply; if you don't ask, it never responds... |
严重
[术语]
"213 lobster"
理由: '213' is Chinese internet slang for '2B/idiotic' (二百三); should be 'stupid (213/2B) OpenClaw' or explain the meaning | "213" is slang for stupid and "龙虾" here again is OpenClaw; literal "213 lobster" obscures meaning. | Should be 'OpenClaw'
严重
[准确性]
"He asked me to install a skills skill"
理由: Subject error. The user asked the AI ('要他安装' -> 'I asked it to install'), not the other way around. | Should be 'I asked it to install a skill' - subject is wrong, '他' refers to the AI, not a third party | Confusing pronoun and repetition. Should be 'I asked it to install a skill' as in reference. | Subject misread; speaker is asking the system to install a skill, not "he asked me". Also "skills skill" is redundant and unclear. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Have I raised a 213 shrimp? |
严重
[准确性]
"213 shrimp"
理由: "213龙虾" uses "龙虾" as nickname for OpenClaw and "213" as slang for stupid; translation loses both the OpenClaw reference and the slang connotation. | '小龙虾' should be 'OpenClaw', not 'shrimp'. '213' is slang for 'stupid/2B', should be clarified. Missing content section. | Mistranslation of slang. '龙虾' is 'OpenClaw'. '213' is slang for 'stupid' (based on stroke count or phonetic similarity to 'er bai wu' or similar insults). The meaning is 'Did I deploy a stupid OpenClaw?'. | Mistranslation: '213' (Chinese internet slang for '2B' or stupid) is not translated, and '龙虾' (OpenClaw) is translated as 'shrimp' instead of 'OpenClaw'. |
| google/gemma-3-12b-it | Title: Did I raise a 213 lobster? Content: It took him an hour to install a skill, he only answers when asked, and never replies unless you ask... |
严重
[准确性]
"a 213 lobster"
理由: Incorrect; '213龙虾' is slang meaning 'stupid/2B OpenClaw', not literal '213 lobster'; should be 'Did I deploy a stupid (213/2B) OpenClaw?' | “龙虾” stands for OpenClaw; using literal “lobster” hides that it is software; “213” slang meaning stupid not clarified. | Fails to translate the slang '213' (stupid/foolish) and '龙虾' (OpenClaw), resulting in a confusing literal phrase about a numbered lobster. |
| google/gemma-3-1b-it | "en": "Translation of '我是不是养了一只213龙虾?' in en", "zh-TW": "我是否养了只213个龙虾?", "ja": "213のタイワンのタコが飼われているかどうか?", "ko": "213개의 란타오 타코가 키워지고 있는 건가요?”, "de": "Ist es nicht möglich, dass ich eine 213-Zoll-Languste halten?", "fr": "Est-ce que je ne suis pas sûr(e) d'avoir une languste de 213 pouces ?", "it": "È possibile che io abbia una langusta di 213 pollici?" } |
致命
[准确性]
""en": "Translation of '我是不是养了一只213龙虾?' in en""
理由: Provides meta text instead of an English translation and omits the nuance of calling the OpenClaw instance stupid (213/2B). | Hypothesis is completely invalid - it's a JSON structure with multiple language codes instead of an English translation. Should be a simple English translation like 'Did I deploy a stupid (213/2B) OpenClaw?' | The hypothesis does not match the reference translation. | The hypothesis is a malformed JSON object containing meta-translations for multiple languages instead of a coherent English translation of the source text. | Hypothesis is not a translation but a placeholder with multiple language entries - completely incorrect output |
| google/gemma-3-4b-it | Did I adopt a 213 lobster? |
严重
[准确性]
"213 lobster"
理由: "213龙虾" is slang for a dumb/stupid OpenClaw instance; translating literally as "213 lobster" without conveying the stupidity or OpenClaw reference is misleading. | Incorrect translation: 养 in this context means 'deploy' or 'host' (as in deploying an AI agent), not 'adopt'. | Mistranslates '小龙虾' as 'lobster' instead of 'OpenClaw'. '213' should be explained as '2B/stupid'. Omits content. | Should be 'stupid (213/2B) OpenClaw' as per reference. |
| google/translategemma-12b-it | Title: Did I raise a 213 lobster? Content: It took him an hour to install a skill, and he only responded when asked. If you don't ask, he will never reply... | 通过 (无共识错误) |
| google/translategemma-4b-it | Is this my 213 lobster? |
严重
[准确性]
"Is this my 213 lobster?"
理由: Mistranslation of slang: '213' is slang for 'stupid' (derived from '250' and '13'), and 'lobster' is 'OpenClaw'. The translation misses the insult and the software reference. | Misses that '213龙虾' is calling OpenClaw stupid/brain-dead (slang 213/2B); also omits OpenClaw reference. | Missing 'Title:' prefix - inconsistent with other outputs. | Should be 'Did I deploy a stupid (213/2B) OpenClaw?' - incorrect slang translation and missing context |
| tencent/HY-MT1.5-1.8B | Translation of '我是不是养了一只213龙虾?' in en |
致命
[准确性]
"Translation of '我是不是养了一只213龙虾?' in en"
理由: The translation lacks context and clarity compared to the reference. | The hypothesis contains no translation, only a placeholder string. The content is also missing. | Meta-text only; does not translate the sentence or capture the slang 213/2B meaning of "stupid" applied to OpenClaw. | The hypothesis is not a proper translation but a placeholder indicating translation to English. It fails to provide the actual translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: Have I Raised a Linux Lobster?", "zh-TW": "Title: 我是不是养了一只213龙虾?", "ja": "Title: 私は213ロブスターを飼っているのでしょうか?", "ko": "Title: 제가 213 로브스터를 키우고 있는 걸까요?", "de":="Titel: Habe ich einen Linux-Lobster gezüchtet?", "fr":="Titre: Ai-je élevé un lobster Linux ?", "it":="Titolo: Ho allevato un lobster Linux?"} |
致命
[准确性]
"Linux Lobster"
理由: Incorrect interpretation of slang. '213龙虾' refers to 'OpenClaw' being stupid/incompetent (213 is slang for 'dumb'), not 'Linux Lobster'. Should translate as 'Did I deploy a stupid OpenClaw?' per reference. | The hypothesis outputs a JSON list of translations for multiple languages instead of a single translation in the target language (zh). It mistranslates '213' (slang for stupid/2B) as 'Linux'. | Adds "Linux" which is not in the source and changes "213" (stupid) lobster to an unrelated concept; also fails to convey that this is about an OpenClaw instance behaving stupidly.
严重
[术语]
"Linux Lobster"
理由: The term 'stupid OpenClaw' is more accurate. | Wrong terminology. Should be 'OpenClaw' not 'Linux Lobster'. The reference clearly indicates this is about the OpenClaw project. | Mistranslation: The hypothesis creates 'Linux Lobster' which is not the correct translation. The source refers to 'OpenClaw' (a software), not Linux, and '213' means 'stupid'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Just landed, looking for an NPC Public Station invite code, thanks man! |
严重
[准确性]
"NPC Public Station"
理由: Mistranslation: 'NPC 公益站' refers to a specific free API proxy site (NPC = a proxy service name), not a general 'Public Station'. The reference correctly uses 'NPC free API proxy site'. | Missing context from title. Should mention 'NPC free API proxy site' more clearly and include the note about reply limit. | 应是NPC公益站=NPC免费API代理站;"Public Station"未体现API/proxy语义,且易被理解为实体公共站点 |
| CohereLabs/tiny-aya-water | Title: Just landed, looking for an NPC Public Station invite code, thanks dude... |
严重
[准确性]
"NPC Public Station"
理由: Incomplete translation missing context. Should specify 'NPC free API proxy site' and use 'bros' instead of 'dude' to match the casual tone. Also misses the parenthetical note about reply limit. | 缺失“免费API代理站/公益站”的明确含义,仅直译为公共站点,信息不足。 | Literal translation of '公益站' (public welfare/free site). 'Public Station' sounds like a bus stop.
轻微
[风格]
"dude"
理由: Singular 'dude' vs plural 'bros' (佬) in source. | 原文“佬”更偏“佬们/老哥们”,群体称呼,“dude”略窄,但影响较小。 | The translation is understandable but 'Just landed' is a less common phrasing compared to 'Newcomer here' in the reference. |
| Qwen/Qwen2.5-14B-Instruct | Title: Newbie here, looking for an NPC free shared API site invitation code, thanks a lot (Today's reply limit has been reached, sorry can't reply) Content: Newbie here, looking for an NPC free shared API site invitation code, thanks a lot... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Newbie just arrived, please give an NPC public proxy invitation code, thanks (Today's reply limit has been reached, sorry can't reply) | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: New here, looking for an NPC free shared API site invitation code, thanks (today's reply count has reached the limit, sorry can't respond) Content: New here, looking for an NPC free shared API site invitation code, thanks... | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Newbie just arrived, seeking an NPC free shared API site invite code, thanks to you (Reply limit reached today, sorry I cannot reply) | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Newcomer here, looking for an NPC free API invite code, thanks bro (daily reply limit reached, sorry can't respond) | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | New user here, asking for an NPC public proxy invite code, thanks (daily replies have reached the limit, sorry I can't reply) |
严重
[准确性]
"public proxy"
理由: "公益站" here refers to a free API proxy site, not necessarily a generic public proxy; loses the "free API" nuance. | '公益站' implies a free/community-run service. 'Public proxy' is okay, but 'free API proxy site' (as in reference) is more precise. | Incomplete translation: 'NPC 公益站' should be 'NPC free API proxy site', not 'NPC public proxy'. |
| google/gemma-3-12b-it | New here, looking for an NPC free API site invitation code, thanks (Today's reply count has been reached, sorry I can’t reply). |
严重
[准确性]
"free API site"
理由: Incomplete translation: missing Title field, only Content translated | The hypothesis merges the title and content incorrectly and omits the specific address to 'bros' (佬). The source has a title and content, the hypothesis blends them and misses the 'proxy' nuance in '公益站'. | Source/ reference specify “free API proxy site”; omission of “proxy” drops a nuance of the service type. |
| google/gemma-3-1b-it | New person just arrived, looking for an NPC Community Proxy invitation code, thank you佬... | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: Newbie Just Arrived, Please Request an NPC Free Shared API Site Invite Code, Thank You, Sir (Today's Reply Count Has Reached the Limit, Sorry I Cannot Reply) |
轻微
[流畅性]
"Please Request an NPC Free Shared API Site Invite Code"
理由: Unnatural phrasing; should be "asking for" rather than "please request" which sounds like instructing the reader to request on their behalf. | Awkward phrasing 'Please Request' should be 'Asking for'. 'Thank You, Sir' is overly formal; should be 'Thanks, bros'. | The source '求一个' is 'asking for' or 'requesting'. 'Please Request' sounds like an imperative command to the reader, whereas the user is making the request.
轻微
[风格]
"Sir"
理由: "佬" in this community context is more like "bro" than formal "sir"; style/nuance mismatch. | '佬' is community slang for 'bros' or 'experts', not 'Sir' which is too formal. | Incorrect translation: 感谢佬 should be 'thanks, bros' (informal), not 'Thank You, Sir' (formal). The term 佬 is informal slang. |
| google/translategemma-12b-it | Title: New user here, looking for an NPC public API site invitation code. Thanks (Today's reply limit has been reached, sorry I can't reply). Content: New user here, looking for an NPC public API site invitation code. Thanks... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: New user just arrived, please provide an NPC free shared API site invitation code, thank you sir (today's reply count has reached the limit, sorry I cannot reply) |
轻微
[风格]
"thank you sir"
理由: The term 'sir' is used instead of 'bros.' | Should be 'Thanks, bros!' - incorrect tone and register for the casual forum context | Slightly different phrasing from reference, but meaning is preserved. 'NPC' is kept as is which is correct. | Tone mismatch: '佬' is informal community slang for 'bros' or 'experts', 'sir' is too formal. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of 'Title: 新人刚来,求一个NPC 公益站邀请码,感谢佬(今日回复数已达上限,抱歉无法回复)' in en", "zh-TW": "Translation of 'Title: 新人刚来,求一个NPC 公益站邀请码,感谢佬(今日回复数已达上限,抱歉无法回复)' in zh-TW", "ja": "Translation of 'Title: 新人刚来,求一个NPC 公益站邀请码,感谢佬(今日回复数已达上限,抱歉无法回复)' in ja", "ko": "Translation of 'Title: 新人刚来,求一个NPC 公益站邀请码,感谢佬(今日回复数已达上限,抱歉无法回复)' in ko", "de": "Translation of 'Title: 新人刚来,求一个NPC 公益站邀请码,感谢佬(今日回复数已达上限,抱歉无法回复)' in de", "fr": "Translation of 'Title: 新人刚来,求一个NPC 公益站邀请码,感谢佬(今日回复数已达上限,抱歉无法回复)' in fr", "it": "Translation of 'Title: 新人刚来,求一个NPC 公益站邀请码,感谢佬(今日回复数已达上限,抱歉无法回复)' in it} | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | A newcomer has just arrived and is asking for an invitation code from the NPC public welfare site. Thank you, friend. (Today’s reply limit has been reached; sorry for not being able to respond.) | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Please provide your email address. |
致命
[准确性]
"Please provide your email address."
理由: The title is missing in the hypothesis. | Completely wrong content: The hypothesis content has no relation to the source content. The source is about needing a centralized thread for OpenClaw, not about email. | 完全与原文无关,原文谈论是否需要小龙虾(OpenClaw)集中/汇总贴,含义全部丢失 | Complete hallucination. The text is unrelated to the source which discusses a summary thread for 'OpenClaw'. | Hypothesis is completely unrelated to source. Should be a title translation about OpenClaw discussion threads, not a request for email. |
| CohereLabs/tiny-aya-water | Is it time to open a small shrimp gathering post or summary post about LinuxDo? |
严重
[准确性]
"about LinuxDo"
理由: 原文讨论的是龙虾/OpenClaw相关内容,替换为LinuxDo改变主题对象。 | Hallucination/Insertion. The source says 'forum' (implied LinuxDo contextually), but the hypothesis explicitly adds 'LinuxDo' where the source just says 'forum' or implies it. More importantly, it misses the specific topics (installation, security). | Multiple errors: '小龙虾' is slang for OpenClaw not literal shrimp; title should be 'Do we need to start a centralized or summary thread for OpenClaw now?'; 'LinuxDo' reference is incorrect. | Mistranslation: '小龙虾' is slang for 'OpenClaw', not literal 'small shrimp'. Also 'LinuxDo' is incorrectly added - the source is about OpenClaw, not LinuxDo forum. |
| Qwen/Qwen2.5-14B-Instruct | Title: Is it time to create a dedicated thread or summary post for crayfish? Content: The forum currently has threads about crayfish installation guides, whether they are really needed, security issues... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Is it time to start a concentrated post or summary thread for small freshwater crabs now? Content: Now, the forum has installation tutorials for lobsters, whether they really need them, safety issues... |
致命
[准确性]
"small freshwater crabs"
理由: Mistranslation of '小龙虾' (OpenClaw) as literal 'small freshwater crabs'. Also biologically incorrect (they are crayfish/lobsters, not crabs). | Should be 'OpenClaw' - 小龙虾 is slang for OpenClaw, not literal crabs | Misinterprets '小龙虾' which in this context is OpenClaw software, not crustaceans. |
| Qwen/Qwen3-14B | Adesso è necessario aprire un thread dedicato ai gamberetti o un thread di sintesi? |
轻微
[准确性]
"thread dedicato ai gamberetti"
理由: Wrong language: hypothesis is in Italian but target language is Chinese. 'Gamberetti' (shrimp) is incorrect - should be 'OpenClaw' or '小龙虾'. | Mistranslation of '小龙虾' (OpenClaw) as literal 'shrimp'. | Hypothesis is in Italian, not Chinese. '小龙虾' refers to OpenClaw tool, not shrimp. Complete language mismatch. | “小龙虾/龙虾” here is OpenClaw; translating as “shrimp” obscures the software/product reference. |
| Qwen/Qwen3.5-4B | Title: Do we need to start a dedicated thread or summary post for crayfish now? Content: The forum currently has installation guides for crayfish, whether they're actually needed, and safety concerns... |
严重
[准确性]
"crayfish"
理由: "小龙虾" here is OpenClaw software, not literal crayfish. | Should be 'OpenClaw now?' - proper name should be used | Failure to translate slang. '龙虾' refers to 'OpenClaw'.
严重
[术语]
"crayfish"
理由: Should be 'OpenClaw' - '小龙虾' is slang for OpenClaw in this context | Incorrect - '小龙虾' is slang for 'OpenClaw', not literal crayfish | Should be 'OpenClaw' |
| Qwen/Qwen3.5-9B | Title: Is it time to create a dedicated thread or summary post for crayfish? Content: Currently on the forum, there are installation tutorials for crayfish, questions about whether they're truly necessary, and concerns regarding security... | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Now is there a need to open a small shrimp (crayfish) sticky or summary post? |
严重
[准确性]
"small shrimp (crayfish)"
理由: '小龙虾' is slang for 'OpenClaw', not 'small shrimp' or 'crayfish'. Missing content section entirely. | In this context "小龙虾" is a nickname for OpenClaw, not literal crayfish; should reference the software/project. | Mistranslation: '小龙虾' (OpenClaw) is translated as 'small shrimp (crayfish)' instead of using the proper name 'OpenClaw'. | Mistranslation of slang. '小龙虾' is the codename for 'OpenClaw'. Adding '(crayfish)' reinforces the error. |
| google/gemma-3-12b-it | Title: Should we open a dedicated or summary thread for small lobsters now? Content: Currently, there are installation tutorials about lobsters on the forum, whether they are really needed, and security issues... |
严重
[准确性]
"small lobsters"
理由: The term 'small lobsters' is incorrect; it should be 'OpenClaw.' | Literal translation of '小龙虾' (OpenClaw slang) as the animal 'small lobsters' loses the technical meaning. | Incorrect; '小龙虾' is slang for 'OpenClaw', not literal 'small lobsters'; should be 'Do we need to start a centralized or summary thread for OpenClaw now?' | “小龙虾/龙虾” here is OpenClaw project; literal “lobsters” changes topic. |
| google/gemma-3-1b-it | Now, is it necessary to have a small shrimp concentrate or a summary sticker? |
严重
[准确性]
"Now, is it necessary to have a small shrimp concentrate or a summary sticker?"
理由: '小龙虾' is slang for OpenClaw, not literal shrimp. Also 'concentrate' and 'sticker' are incorrect translations. Should be 'Do we need to start a centralized or summary thread for OpenClaw now?' | '小龙虾' is slang for OpenClaw, not literal small shrimp; '集中贴' means thread/post not 'concentrate' | The hypothesis only translates the title and not the entire content, making it incomplete. |
| google/gemma-3-4b-it | Is it now necessary to create a small lobster collection or summary thread? |
严重
[准确性]
"small lobster"
理由: Should be 'OpenClaw' as per reference. | "小龙虾" refers to OpenClaw, not actual lobsters; mistranslation loses the tool name and forum context. | Mistranslates '小龙虾' as 'small lobster' instead of 'OpenClaw'. Also omits content entirely. |
| google/translategemma-12b-it | Title: Should we create a dedicated or summary thread about crawfish now? Content: Currently, there are installation tutorials for crawfish on the forum, discussions about whether they are truly needed, and questions about their security... | 通过 (无共识错误) |
| google/translategemma-4b-it | Should we create a dedicated thread or summary thread about lobsters? |
严重
[准确性]
"lobsters"
理由: '小龙虾/龙虾' here refers to OpenClaw; literal 'lobsters' loses the product name and technical context. | Mistranslation of slang: '龙虾' refers to 'OpenClaw' software, not the animal. | Should be 'OpenClaw' - incorrect slang translation throughout |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: Hypothesis is completely incorrect. It should be a translation of the title and content about OpenClaw, not the word 'Login'. | Hallucination. The hypothesis outputs 'Login' instead of translating the source text about OpenClaw threads. Both title and content are missing. | The hypothesis is completely wrong - it just says 'Login' which has no relation to the source content about OpenClaw centralized threads. | The translation is completely off and does not match the source or reference. | Completely unrelated to the source; omits all information about starting a centralized/summary OpenClaw thread and thus fails to translate the title. |
| tencent/HY-MT1.5-7B | "en": "Title: Is it time to create a dedicated thread or summary post about lobsters now?", "zh-TW": "Title: 是不是现在该创建一个专门讨论龙虾的帖子或总结性帖子了呢?", "ja": "Title: 今はロブスターに関する専門的なスレッドやまとめ投稿を作成する時期ですか?", "ko": "Title: 지금은 로브스터에 관한 전용 스레ッ드나 요약 게시물을 만들 때가 되었나요?", "de":="Titel: Ist es jetzt an der Zeit, einen speziellen Thread oder eine Zusammenfassungsveröffentlichung über Lobster zu erstellen?", "fr":="Titre : Est-il maintenant temps de créer un fil ou une publication de synthèse consacrée aux homards ?", "it":="Titolo: È ora di creare un thread dedicato o una pubblicazione riassuntiva sui gamberi di mare?"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Feelings about the Shrimp Game |
严重
[准确性]
"Feelings about the Shrimp Game"
理由: Hallucination/Mistranslation. The source discusses the hype around 'OpenClaw' (lobster/crawfish slang), not a 'Shrimp Game'. The meaning of the bubble bursting is lost. | Vague and inaccurate. Should be 'I feel like OpenClaw is too hyped up right now' to match source meaning. | Mistranslation: '小龙虾' is the Chinese internet slang for 'OpenClaw', not 'Shrimp Game'. The reference correctly uses 'OpenClaw'. | “小龙虾”指OpenClaw,未体现;“Shrimp Game”改变为游戏、偏离项目/热度讨论语境 |
| CohereLabs/tiny-aya-water | Title: The shrimp is not just about eating, but also about enjoying the taste and feeling the freshness of the sea breeze. |
致命
[准确性]
"The shrimp is not just about eating, but also about enjoying the taste and feeling the freshness of the sea breeze."
理由: Completely incorrect translation. Source discusses OpenClaw hype, not literal shrimp/food. Should be 'I feel like OpenClaw is too hyped up right now. When the tide recedes, it'll definitely be a mess.' | Complete hallucination. The text is about OpenClaw being hyped, not culinary shrimp. | Completely mistranslated: The source is about 'OpenClaw being overhyped' (小龙虾 = OpenClaw), but the hypothesis talks about eating shrimp and sea breeze - this is a completely different meaning. | 完全背离原文意思,原文讨论OpenClaw过热及后续混乱,译文变成关于吃虾和海风的描写,语义严重偏离。 |
| Qwen/Qwen2.5-14B-Instruct | Title: The little crayfish now feel too hot, when the tide recedes, there will definitely be chaos everywhere~ Content: It feels like many people do things just for the sake of doing them, not knowing what they're really after. Just wait for the aftermath. |
轻微
[准确性]
"many people do things just for the sake of doing them, not knowing what they're really after"
理由: Original is specifically about deploying OpenClaw 'for the sake of it', not generic 'do things'; slightly generalized. | Should be 'just deploying OpenClaw for the sake of it'; more specific to the context | Misses the specific slang '为了虾而虾' (deploying OpenClaw just for the sake of having it/following the trend). |
| Qwen/Qwen2.5-7B-Instruct | Feeling that the freshwater prawns are too hot now, when the tide goes out, there will definitely be a mess. |
致命
[准确性]
"freshwater prawns"
理由: '小龙虾' refers to OpenClaw, not actual prawns. | Should be 'OpenClaw' - 小龙虾 is slang for OpenClaw, not literal prawns | Mistranslation of '小龙虾' (OpenClaw) as literal 'freshwater prawns'. 'Hot' should be 'hyped'. |
| Qwen/Qwen3-14B | Il granchio attuale è troppo caldo, una volta che la marea si ritirerà, sicuramente ci sarà un bel caos ~ |
轻微
[准确性]
"Il granchio attuale"
理由: Wrong language: hypothesis is in Italian but target language is Chinese. 'Granchio' (crab) is incorrect - should be 'OpenClaw' or '小龙虾'. 'troppo caldo' (too hot) is wrong - should be 'too hyped'. | Again misinterprets “小龙虾” (OpenClaw) as a crab, losing the product name and established metaphor. | Hypothesis is in Italian, not Chinese. '小龙虾' refers to OpenClaw tool, not crab. Complete language mismatch. | Mistranslation of '小龙虾' (OpenClaw) as 'crab' and '热' (hyped/popular) as 'hot' (temperature). Missed the idiom '一地鸡毛' (mess/feathers). |
| Qwen/Qwen3.5-4B | Title: I feel like the crayfish are too hot right now; the tide is receding, and there will surely be chaos everywhere. Content: It feels like so many people are doing it just for the sake of shrimp, not knowing what to do, waiting for everything to fall apart. |
严重
[术语]
"crayfish"
理由: Should be 'OpenClaw' - '小龙虾' is slang for OpenClaw in this context | Incorrect - '小龙虾' is slang for 'OpenClaw', not literal crayfish | Should be 'OpenClaw' |
| Qwen/Qwen3.5-9B | Title: I feel like the crayfish now are too hot. When the tide recedes, there will definitely be chaos everywhere~ Content: It feels like many are just for the sake of shrimp, not knowing what to do. Wait for chaos. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The shrimp seems too hot now, when the tide goes out, there will be plenty of chicken feathers lying around. |
严重
[准确性]
"The shrimp seems too hot now"
理由: '小龙虾' should be 'OpenClaw', not 'shrimp'. Missing content section entirely. | "小龙虾" here stands for OpenClaw; translating as "shrimp" obscures that the topic is an AI project rather than seafood. | Mistranslation: '小龙虾' (OpenClaw) is translated as 'shrimp' instead of using the proper name 'OpenClaw'. | Mistranslation of slang. '小龙虾' is 'OpenClaw'. 'Hot' means 'hyped'. |
| google/gemma-3-12b-it | Title: Feeling like the current xiaolongxia is too hot, as the tide recedes, there will definitely be feathers everywhere! Content: It feels like so many people are just here for the xiaolongxia and don't even know what to do. Waiting for a mess. |
严重
[准确性]
"xiaolongxia"
理由: The term 'xiaolongxia' is incorrect; it should be 'OpenClaw.' | In this context it is OpenClaw project; leaving pinyin without clarifying software meaning diverges from intended reference to OpenClaw. | Should translate '小龙虾' to 'OpenClaw', not leave as 'xiaolongxia'; should be 'I feel like OpenClaw is too hyped up right now' | Transliteration of the slang term '小龙虾' (OpenClaw) without explaining it or translating it to the recognized English name 'OpenClaw', making it unintelligible. |
| google/gemma-3-1b-it | Feeling like there are so many shrimp, and they don't know what to do yet. It will probably be a mess of chicken feathers. |
严重
[准确性]
"there are so many shrimp, and they don't know what to do yet"
理由: Subject is people deploying OpenClaw for the sake of it, not “shrimp” themselves not knowing what to do. | '小龙虾' is slang for OpenClaw, not literal shrimp. Should be 'I feel like OpenClaw is too hyped up right now' and 'a mess of chicken feathers' should be 'a mess'. The hypothesis misses the title entirely. | '小龙虾' is slang for OpenClaw, not literal shrimp; '一地鸡毛' is an idiom meaning 'mess/disaster' not literal chicken feathers | The hypothesis misses some context present in the reference. | Mistranslation of '为了虾而虾'. The source means people are using OpenClaw just for the sake of using it, not that there are literally many shrimp. |
| google/gemma-3-4b-it | Title: I feel like these little lobsters are too hot right now, and when the tide goes out, it will be a complete mess~ Content: It feels like they’re just shrimp for the sake of shrimp, not knowing what to do, and probably ending up in a disaster. |
严重
[准确性]
"little lobsters"
理由: Should be 'OpenClaw' as per reference. | Mistranslates '小龙虾' as 'little lobsters' instead of 'OpenClaw'. Also omits content entirely. | Again, "小龙虾" refers to OpenClaw; literal "lobsters" mistranslates the subject. |
| google/translategemma-12b-it | Title: I feel like the current Xiaolongxia (crayfish) is too hot right now. As the tide recedes, there will definitely be a lot of mess.~ Content: It feels like many people are just here for the crayfish and don't know what to do. Waiting for a lot of mess. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: I feel like the lobsters here are too hot! When the tide goes out, there will be nothing left but mess. |
严重
[准确性]
"lobsters"
理由: Again, '小龙虾' is OpenClaw; literal 'lobsters' misleads. | Mistranslation of slang: '小龙虾' refers to 'OpenClaw'. 'Hot' should be 'hyped'. | Should be 'OpenClaw is too hyped up' - incorrect slang translation |
| tencent/HY-MT1.5-1.8B | Translation of '感觉现在的小龙虾太热了啊,潮水退去,肯定一地鸡毛~' in English |
致命
[准确性]
"Translation of '感觉现在的小龙虾太热了啊,潮水退去,肯定一地鸡毛~' in English"
理由: The translation lacks context and clarity compared to the reference. | The hypothesis contains no translation, only a placeholder string. The content is also missing. | No real translation text is included; it should render the full opinion about OpenClaw being overhyped and future mess using appropriate metaphor. | The hypothesis is not a proper translation but a placeholder indicating translation to English. It fails to provide the actual translated content. |
| tencent/HY-MT1.5-7B | I think the current crayfish are too hot. Once the tide recedes, there will definitely be a mess everywhere. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: CC-Switch + CPA + CCR + CC + OpenCode打造免费AI编程军团 | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: CC-Switch + CPA + CCR + CC + OpenCode打造免费AI编程军团 | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: CC-Switch + CPA + CCR + CC + OpenCode Creating a Free AI Programming Army Content: I. Basic AI Programming Tool Installation Environment Preparation... Install CC and OpenCode CC is Claude Code... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: CC-Switch + CPA + CCR + CC + OpenCode Build Free AI Programming Squad Content: 1. Install Basic AI Programming Tools Environment Preparation... Install CC and OpenCode. CC stands for Claude Code... |
轻微
[准确性]
"CC stands for Claude Code"
理由: Correct but missing period at end | The entire content is in English, failing the fundamental requirement to translate into the target language (Chinese). The source text '一' is translated as '1' instead of '一' or '第一部分', and 'CC' is not expanded to 'Claude Code' in the list as done in the reference, though the explanation is present. | Should be 'CC stands for Claude Code' but the reference says 'Claude Code' directly - minor inconsistency in abbreviation explanation |
| Qwen/Qwen3-14B | Titolo: Costruisci un esercito gratuito di intelligenza artificiale con CC-Switch + CPA + CCR + CC + OpenCode | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: CC-Switch + CPA + CCR + CC + OpenCode Build Free AI Programming Legion Content: I. Basic AI Programming Tool Installation Environment Preparation... Install CC and OpenCode CC is Claude Code... |
轻微
[准确性]
"CC-Switch + CPA + CCR + CC + OpenCode"
理由: Ref omits the last CC as redundant and clarifies CC is Claude Code; translation repeats CC without clarification. | Should be 'CC-Switch + CPA + CCR + Claude Code + OpenCode to build a free AI programming legion' - CC is abbreviation for Claude Code | Inconsistent expansion. The source explains 'CC 即 Claude Code'. The hypothesis leaves 'CC' in the title but explains it in the body. The reference expands it in the title for clarity. |
| Qwen/Qwen3.5-9B | Title: CC-Switch + CPA + CCR + cc + OpenCode to Build Free AI Programming Army Content: I. Basic AI Programming Tool Installation Environment Preparation... Install cc and OpenCode. cc is Claude Code... | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: CC-Switch + CPA + CCR + CC + OpenCode Build Free AI Programming Team |
严重
[准确性]
"CC-Switch + CPA + CCR + CC + OpenCode Build Free AI Programming Team"
理由: '军团' means 'legion' or 'army', implying a large group or suite of tools. 'Team' is weaker. | 'CC' should be 'Claude Code' for clarity. Missing content section entirely. | Incomplete translation: 'CC' should be expanded to 'Claude Code' as shown in reference. |
| google/gemma-3-12b-it | Title: CC-Switch + CPA + CCR + CC + OpenCode Build a Free AI Programming Legion Content: I. Installation of Basic AI Programming Tools Environment Preparation...Installation of CC and OpenCode CC is Claude Code... | 通过 (无共识错误) |
| google/gemma-3-1b-it | Introduction to CC-Switch + CPA + CCR + CC + OpenCode for building free AI programming armies. |
轻微
[准确性]
"Introduction to"
理由: Reads like a generic intro; misses explicit reference to Claude Code instead of just ‘CC’ and slightly weakens the active ‘to build’ phrasing of the title. | Added 'Introduction to' which is not present in the source title. | The hypothesis misses some technical details present in the reference. |
| google/gemma-3-4b-it | Title: CC-Switch + CPA + CCR + CC + OpenCode Build a Free AI Programming Legion Content: I. Basic AI programming tools installation Environment preparation... Install CC and OpenCode CC is Claude Code... | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: CC-Switch + CPA + CCR + CC + OpenCode: Building a Free AI Programming Army Content: I. Installation of Basic AI Programming Tools Environment Setup... Installing CC and OpenCode CC refers to Claude Code... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: CC-Switch + CPA + CCR + CC + OpenCode create a free AI programming team |
轻微
[术语]
"programming team"
理由: '军团' is closer to 'legion'; 'team' is weaker but generally understandable. | The term 'team' is used instead of 'legion.' | Nuance: 'Legion' (reference) captures the scale and military metaphor of '军团' better than 'team'. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: Hallucination. The hypothesis outputs 'Login' instead of translating the technical title and content. | Output is unrelated to the source and omits all information about CC-Switch, CPA, CCR, Claude Code, and OpenCode; no translation of the title or content is provided. | Hypothesis is completely incorrect. It should be a translation of the title and content about AI programming tools, not the word 'Login'. | The hypothesis is completely wrong - it just says 'Login' which has no relation to the source content about CC-Switch and AI programming tools. | The translation is completely off and does not match the source or reference. |
| tencent/HY-MT1.5-7B | "en": "Title: Building a Free AI Programming Army with CC-Switch + CPA + CCR + CC + OpenCode", "zh-TW": "Title: Building a Free AI Programming Army with CC-Switch + CPA + CCR + CC + OpenCode", "ja": "Title: 建立免费的AI编程军队,使用CC-Switch + CPA + CCR + CC + OpenCode", "ko": "Title: 使用CC-Switch + CPA + CCR + CC + OpenCode构建免费的AI编程军队", "de":="Titel: Aufbau einer kostenlosen AI-Programmierungstruppe mit CC-Switch + CPA + CCR + CC + OpenCode", "fr": "Titre: Construire une armée de programmation AI gratuite avec CC-Switch + CPA + CCR + CC + OpenCode", "it": "Titolo: Costruire un esercito di programmazione AI gratuito con CC-Switch + CPA + CCR + CC + OpenCode"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | "en": "Title: Competing for 'OpenClaw Moment', Wuxi High New District Proposes to Issue 'Twenty-two Claws of Dragon虾'", "zh-TW": "標題:爭奪「開放爪子時刻」,無錫高新區拟發行「十二條龍虾」", "ja": "タイトル:「オープンクースの瞬間」を巡り、無錫高新区が『二十二本爪の龙虾』を発行する予定", "ko": "제목: '오픈클로우 순간'을 위해 경쟁하며, 우시 고신구(High New District)가 '12개의 용의 발톱龙虾'를 발행할 예정", "de": "Titel: Um den ‚Momente des OpenClaw‘ zu konkurrieren, plant der Hochneue Bezirk von Wuxi die Ausgabe von ‚Zwölf Klauen des Drachen虾‘", "fr": "Titre : Pour rivaliser pour le « moment OpenClaw », le Nouveau Haut-Région de Wuxi prévoit d'émettre les « Douze griffes du dragon虾 »", "it": "Titolo: Per competere per il 'momento dell\'OpenClaw', il Nuovo Distretto di Alta Tecnologia di Wuxi prevede l'emissione delle 'Dodecine di artigli del drago虾'" } <|im_start|>assistant { "en": "Title: Competing for 'OpenClaw Moment', Wuxi High New District Proposes to Issue 'Twenty-two Claws of Dragon虾'", "zh-Hans": "标题:争夺‘开放爪子时刻’,无锡高新区拟发行‘十二条龙虾’", "ja": "タイトル:「オープンクースの瞬間」を巡り、無錫高新区が『二十二本爪の龙虾』を発行する予定", "ko": "제목: '오픈클로우 순간'을 위해 경쟁하며, 우시 고신구(High New District)가 '12개의 용의 발톱龙虾'를 발행할 예정", "de": "Titel: Um den ‚Momente des OpenClaw‘ zu konkurrieren, plant der Hochneue Bezirk von Wuxi die Ausgabe von ‚Zwölf Klauen des Drachen虾‘", "fr": "Titre : Pour rivaliser pour le « moment OpenClaw », le Nouveau Haut-Région de Wuxi prévoit d'émettre les « Douze griffes du dragon虾 »", "it": "Titolo: Per competere per il 'momento dell\'OpenClaw', il Nuovo Distretto di Alta Tecnologia di Wuxi prevede l'emissione delle 'Dodecine di artigli del drago虾'" } |
致命
[准确性]
"Twenty-two Claws of Dragon虾"
理由: 原文为“龙虾十二条”=12条政策;误译为"Twenty-two"且混入中英“Dragon虾”,数量与含义皆错误 | Hypothesis provides multilingual output with incorrect translations and mixed languages. Should be single English translation: 'Competing for the OpenClaw Moment, Wuxi High-tech Zone plans to introduce 12 Policies for OpenClaw'. | Severe mistranslation. '龙虾十二条' refers to '12 policies/measures for OpenClaw', not '22 claws'. The number and meaning are wrong. | Mistranslation: '龙虾十二条' refers to '12 Policies for OpenClaw' (a set of policies), not 'Twenty-two Claws of Dragon虾'. The reference correctly translates this as '12 Policies for OpenClaw'. |
| CohereLabs/tiny-aya-water | Title: Competing for 'OpenClaw Moment', Wuxi High New District Proposes to Issue 'Twenty-Four Claws of Dragon and Crab' |
严重
[准确性]
"'Twenty-Four Claws of Dragon and Crab'"
理由: Mistranslation: '龙虾十二条' (12 policies for OpenClaw) was incorrectly translated as 'Twenty-Four Claws of Dragon and Crab'. The number is wrong and the meaning is distorted. | Severe mistranslation of '龙虾十二条' (12 Policies for Lobster/OpenClaw). Interprets '12' as '24' and 'Policies' as 'Claws of Dragon and Crab'. | Mistranslates '龙虾十二条' as 'Twenty-Four Claws of Dragon and Crab'. Should be '12 Policies for OpenClaw' - '十二条' means 12 items/policies, not 24. | 原文“龙虾十二条”为12条政策,与“二十四”及“龙与蟹”无关,数字与意象均错误。 | Should be '12 Policies for OpenClaw'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Compete for the 'OpenClaw Moment', Wuxi Hi-tech Zone Plans to Issue 'Twelve Measures on Lobster' Content: Source: ...Wuxi Hi-tech Zone plans to introduce 'Twelve Measures on Lobster'...to support OpenClaw open-source AI agents... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Striving for the ‘OpenClaw Moment’, Wuxi High-tech Zone Plans the ‘Shrimp Twelve Articles’ Content: Source: ...Wuxi High-tech Zone plans to launch the ‘Shrimp Twelve Articles’... Supporting OpenClaw open-source AI agents... |
严重
[准确性]
"‘Shrimp Twelve Articles’"
理由: Mistranslation of '龙虾' (OpenClaw) as 'Shrimp'. 'Twelve Articles' is too literal for '十二条' (12 Policies/Measures). | Should be ''12 Policies for OpenClaw'' - 龙虾十二条 refers to OpenClaw policies, not literal shrimp | In context this is policy branding tied to OpenClaw ('龙虾十二条'); '12 Policies for OpenClaw' is clearer. 'Shrimp' misreads the established OpenClaw metaphor. |
| Qwen/Qwen3-14B | Concorrenza per il momento "OpenClaw", il distretto tecnologico di Wuxi sta pianificando l'introduzione delle "Dodici Linee del Granchio" |
轻微
[准确性]
""Dodici Linee del Granchio""
理由: Wrong language: hypothesis is in Italian but target language is Chinese. 'Granchio' (crab) is incorrect - should be 'OpenClaw' or '小龙虾'. 'Dodici Linee' (Twelve Lines) should be '十二条' (12 policies). | Hypothesis is in Italian, not English/Chinese. '龙虾十二条' should be '12 Policies for OpenClaw' not 'Twelve Lines of Crab'. Complete language and terminology mismatch. | Mistranslation of '龙虾十二条' (12 Policies for OpenClaw). Translated '龙虾' as literal 'Crab' and '条' (policies/clauses) as 'Lines'. | “龙虾十二条” here are twelve policies for OpenClaw; “linee del granchio” mistranslates both the policy nature and the OpenClaw name. |
| Qwen/Qwen3.5-4B | Title: Contesting the 'OpenClaw Moment', Wuxi High-tech Zone Plans to Issue 'Twelve Rules for Crawfish' | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Compete for the 'OpenClaw Moment', Wuxi High-tech Zone Plans to Introduce 'Lobster Twelve Articles' Content: Source: ...Wuxi High-tech Zone plans to launch 'Lobster Twelve Articles'... Supporting OpenClaw open-source AI agents... |
严重
[术语]
"'Lobster Twelve Articles'"
理由: 'Lobster' should be 'OpenClaw' - '龙虾' is Chinese name for OpenClaw; '十二条' should be '12 Policies' not 'Twelve Articles' | Minor variation. Reference uses '12 Policies for OpenClaw' which is more specific and accurate. | "龙虾十二条" refers to 12 policy measures for OpenClaw; "Lobster Twelve Articles" is overly literal and obscures policy nature. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Struggle for "OpenClaw Moment", Nantong High-tech Zone to Issue "Dragon Shrimp Twelve Clauses"\nContent: Source: ...Nantong High-tech Zone to Launch "Dragon Shrimp Twelve Clauses"...Support OpenClow Open Source AI Body... |
严重
[准确性]
"Nantong High-tech Zone"
理由: Mistranslation: '无锡' (Wuxi) is incorrectly translated as 'Nantong'. | Misidentifies location; source specifies Wuxi High-tech Zone, not Nantong. | The translation incorrectly translates '无锡高新区' as 'Nantong High-tech Zone' instead of 'Wuxi High-tech Zone'. | Geographical error. '无锡' is Wuxi, not Nantong. | Location is wrong - should be 'Wuxi' not 'Nantong'. '龙虾' should be 'OpenClaw', not 'Dragon Shrimp'. Content section has typo 'OpenClow' instead of 'OpenClaw'.
严重
[准确性]
"Dragon Shrimp Twelve Clauses"
理由: Mistranslation: '龙虾十二条' (12 Policies for OpenClaw) is translated literally as 'Dragon Shrimp Twelve Clauses' instead of the proper name. | Mistranslation of slang. '龙虾' is 'OpenClaw'. 'Twelve Clauses' is a literal translation of '十二条', which refers to '12 Policies' or '12 Measures'. | "龙虾" is branding for OpenClaw; translating as "Dragon Shrimp" loses connection to OpenClaw and sounds literal. |
| google/gemma-3-12b-it | Title: Compete for "OpenClaw Moment", Wuxi High-Tech Zone Plans to Introduce "Twelve Lobster Articles" |
严重
[准确性]
""Twelve Lobster Articles""
理由: The term 'Lobster' is incorrect; it should be 'OpenClaw.' | “龙虾十二条” is policy items for OpenClaw; “Lobster Articles” literalizes and loses policy/brand nuance; better “12 Policies for OpenClaw” as in reference. | Missing content translation; only title provided, source includes content about policies and support | The hypothesis omits the content section. Additionally, 'Twelve Lobster Articles' is a literal translation of '龙虾十二条' which should be '12 Policies for OpenClaw' to convey the meaning of regulations/policies. |
| google/gemma-3-1b-it | "en": "Source: ...No锡高新区拟推‘龙虾十二条’...", "zh-TW": "来源:...无锡高新区拟推‘龙虾十二条’...支持OpenClaw开源ai智能体...", "ja": "ソース:...無錫高新区擬推‘龍蝦十二條’…”, "ko": "แหล่งที่มา: ...無錫高新区擬推‘龍蝦十二條’... 支持 OpenClaw 開源 AI 智能體...", "de": "Quelle: ...No Sintonisierungsgruppe plant die Einführung von ‘Drachenfisch zwölf Tore’...", "fr": "Source : ... No Sontionisation groupe propose l'introduction de ‘Poisson dragon douze points’...", "it": "Fonte: ... No Sintonizzazione gruppo propone l'introduzione di ‘Pesce dragone dodici punti’..." } |
轻微
[准确性]
""en": "Source: ...No锡高新区拟推‘龙虾十二条’..."
理由: Contains garbled text (“No锡”) and omits the policy/‘12 Policies for OpenClaw’ meaning; not a proper English translation. | Hypothesis is not a proper translation but a placeholder with multiple language entries - completely incorrect output | The hypothesis is a malformed JSON object with broken translations in multiple languages, failing to provide a coherent English translation of the source. | The hypothesis does not match the reference translation. |
| google/gemma-3-4b-it | Competing for ‘OpenClaw Moments’, Wuxi High-Tech Zone Plans to Issue ‘Dragon Claw Twelve Rules’ |
严重
[准确性]
"Dragon Claw Twelve Rules"
理由: "龙虾十二条" in context is policy measures for OpenClaw, not literal dragon claw rules; translation should reflect "12 Policies for OpenClaw"; current version misleads. | The source '龙虾十二条' is a pun referring to 'OpenClaw' (nicknamed Lobster). Translating it as 'Dragon Claw' loses the specific reference to the software and the pun. It should be '12 Policies for OpenClaw' or similar. | Should be '12 Policies for OpenClaw' as per reference. |
| google/translategemma-12b-it | Title: Competing for the "OpenClaw Moment", Wuxi High-Tech Zone Plans to Introduce "Twelve Measures for OpenClaw" | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Competing for "OpenClaw Moment", Wuxi High-Tech Zone to issue "Twelve Lobster Rules" | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '争夺‘OpenClaw时刻’,无锡高新区拟出台‘龙虾十二条’', "zh-TW": "翻译为‘争夺‘OpenClaw时刻’,无锡高新区拟推出‘龙虾十二条’', "ja": "翻译为‘争夺‘OpenClaw时刻’,无锡高新区拟推出‘龙虾十二条’', "ko": "翻译为‘争夺‘OpenClaw时刻’,无锡高新区拟推出‘龙虾十二条’", "de": "Translation of ‘Ausschreiten um die ‘OpenClaw-Momente’, Wuxi High-tech Zone plant den Start von ‘Lobster Twelve’", "fr": "Translation of ‘Championner les ‘Moments OpenClaw’, la zone de haute technologie de Wuxi propose d’introduire ‘Lobster Twelve’”, "it": "Traduzione di ‘Concorrenza per i ‘Momenti OpenClaw’, la zona tecnologica di Wuxi propone l’introduzione di ‘Lobster Twelve’}" }, "input": "标题:争夺‘OpenClaw时刻’,无锡高新区拟出台‘龙虾十二条’\n内容:来源:...无锡高新区拟推‘龙虾十二条’...支持OpenClaw开源AI智能体..." {"en": "The title is “Struggle for the ‘OpenClaw Moment’, Wuxi High-tech Zone plans to introduce ‘Lobster Twelve’", "zh-TW": "標題是「爭奪‘OpenClaw時刻’,無錫高新區計劃推出‘龍蝦十二條’」", "ja": "タイトルは「OpenClawの瞬間を争う、無錫高新区が‘ロブスター12条’を導入する」", "ko": "제목은 ‘OpenClaw 순간을 두고 싸우는, 무석 고신도시가 ‘로브스터 트윈트를 도입하기’", "de": "Die Überschrift lautet „Kampf um den ‚OpenClaw-Moment‘ – Die Wuxi-Hightech-Zone plant den Start von ‚Lobster Twelve‘“, "fr": "Le titre est « Lutte pour le moment OpenClaw : la zone de haute technologie de Wuxi prévoit l’introduction de ‘Lobster Twelve’ », "it": "Il titolo è “Contendere per il ‘Momento OpenClaw’: la zona tecnologica di Wuxi pianifica l’introduzione di ‘Lobster Twelve’”} |
致命
[准确性]
"Translation of '争夺'OpenClaw 时刻'..."
理由: The translation lacks context and clarity compared to the reference. | The hypothesis is a malformed JSON block containing placeholders and literal translations of 'Lobster' instead of the intended meaning (OpenClaw policies). It fails to provide a clean translation. | The hypothesis is not a proper translation but a placeholder showing multiple language translations. It fails to provide the actual translated content in the target language (zh). |
| tencent/HY-MT1.5-7B | "en": "Title: Competing for the ‘OpenClaw Moment’ – Wuxi High-Tech Zone Plans to Introduce the ‘Twelve Rules for Lobsters’ ", "zh-TW": "标题:爭奪‘開爪時刻’——無錫高新區計劃推出‘十二條龍蝦規則’", "ja": "タイトル:『オープンクローの瞬間』を争う――無錫高新技術区が『十二の龍蝦規則』を導入する予定", "ko": "제목: ‘오픈클로의 순간’을 놓고 경쟁하다 – 무석 고신기술구가 ‘십이조각의 용갈 규칙’을 도입할 예정", "de":="Titel: Um den ‚Offenen Klauen-Moment‘ konkurrieren – Die Wuxi-Hochtechnologiezone plant die Einführung der ‚Zwölf Regeln für Lobster‘", "fr":="Titre : Concourir pour le ‚Moment des Griffes Ouvertes‘ – La zone haute technologie de Wuxi prévoit d’introduire les ‚Douze Règles du Homard‘", "it":="Titolo: Competere per il “Momento delle Clavi Aperte” – La zona alta tecnologia di Wuxi intende introdurre le “Dodici Regole del Gambero”"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: About using tag airport issues |
严重
[准确性]
"tag airport"
理由: Mistranslation: '机场' in this context refers to 'proxy provider/VPN service' (internet slang), not literal 'airport'. The reference correctly uses 'TAG proxy provider'. | Mistranslation of slang. '机场' in this context means a proxy/VPN service provider, not a literal airport. | “机场”在语境中为翻墙代理/机场服务,应译为proxy/relay/VPN provider;直译为airport会误导 |
| CohereLabs/tiny-aya-water | Title: Questions about using tag airport |
严重
[准确性]
"tag airport"
理由: Incomplete. Should be 'Regarding the usage issue of TAG proxy provider' to clarify that 'tag' is a proxy provider name and '机场' means proxy provider. | “机场”在语境中为“代理/VPN服务提供商”,直译为“airport”不符合习惯用法,易误解。 | Literal translation of '机场' (slang for proxy provider/VPN service) as 'airport'. |
| Qwen/Qwen2.5-14B-Instruct | Title: About tag proxy provider usage issues Content: Purchased tag's proxy provider, there are indeed many nodes. However, there is an issue during use... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: About the Usage of tag Proxy Provider Content: Bought tag's proxy provider, there are indeed many nodes. But there is an issue during usage... | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: About the use of tag proxy provider Content: I bought tag's proxy provider and there are indeed many nodes. However, there is a problem when using it... | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Regarding issues with using tag airports Content: Bought tag's proxy providers, there are indeed many nodes. However, there is an issue during usage... |
严重
[术语]
"tag airports"
理由: Incorrect - '机场' in this context means 'proxy providers', should be 'TAG proxy provider' | Should be 'TAG proxy provider' | Should be 'TAG proxy provider' - '机场' is slang for proxy provider, not literal airports |
| Qwen/Qwen3.5-9B | Title: Question about using tag proxies Content: I bought a proxy provider with many nodes. However, there is an issue when using... |
轻微
[术语]
"tag proxies"
理由: Should be 'TAG proxy provider' to match reference more closely | Service name "TAG" should be capitalized as a brand and "proxies" is less precise than "proxy provider"/"airport" in context. | Capitalization. 'TAG' is likely a proper noun/brand name and should be capitalized. | Minor variation. Reference's 'Regarding the usage issue of TAG proxy provider' is more specific.
轻微
[准确性]
"a proxy provider with many nodes"
理由: Should be 'the TAG proxy subscription' to match '买了tag的机场' meaning purchased subscription, not just 'bought a provider' | Drops explicit "TAG" brand in the content sentence present in source and reference. | Source '机场' is slang for 'proxy subscription/service'. 'Proxy provider' is accurate. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: About the issue with using tag airport | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Questions about using tag proxy providers Content: I bought many nodes from tag proxy providers. However, there is an issue when using them... |
严重
[准确性]
"tag proxy providers"
理由: Plural “providers” is misleading; original refers to a specific provider/service “TAG 机场” in singular. | Mistranslation: '买了tag的机场' means 'bought TAG proxy subscription', not 'bought many nodes from tag proxy providers' | Misinterpretation of '买了 tag 的机场' (Bought the TAG proxy service/subscription). The hypothesis implies buying nodes from multiple providers named 'tag', rather than a subscription from one provider named 'TAG'. |
| google/gemma-3-1b-it | Translation of '关于tag机场的使用问题' in en |
严重
[准确性]
"Translation of '关于tag机场的使用问题' in en"
理由: The hypothesis contains a meta-instruction describing the task instead of providing the actual translation of the source text. | Meta description, not an English translation; omits that it is about usage issues of the TAG proxy provider. | Hypothesis is incomplete placeholder text. Should provide actual translation like 'Regarding the usage issue of TAG proxy provider' or similar. | The hypothesis only translates the title and not the entire content, making it incomplete. | Hypothesis is not a translation but a placeholder indicating translation was not performed |
| google/gemma-3-4b-it | Title: Regarding the Usage Issues of tag Airport Content: I purchased a tag airport proxy node and there are indeed many nodes. However, there is an issue when using it... |
轻微
[术语]
"tag Airport"
理由: "机场" here means VPN/proxy service; "airport" literal translation is nonstandard, though community users might understand; reference correctly uses "proxy provider". | '机场' in this context is slang for 'proxy provider' or 'VPN service', not a literal 'Airport'. Translating it as 'Airport' is a severe misunderstanding of the jargon. | Should be 'TAG proxy provider' instead of 'tag Airport'. 'Airport' is not standard terminology for proxy services. |
| google/translategemma-12b-it | Title: Questions about using tag proxy providers. Content: I bought tag proxy providers, and there are indeed many nodes. However, there is an issue when using them... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Regarding the use of tag airports |
严重
[术语]
"tag airports"
理由: Should be 'TAG proxy provider' - incomplete terminology translation | Mistranslation: '机场' (proxy provider) was translated as 'airport' instead of 'proxy provider' as in reference. Also 'tag' should be 'TAG' in proper capitalization. | The term 'airports' is used instead of 'proxy provider.' |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 关于tag机场的使用问题' in en |
致命
[准确性]
"Translation of 'Title: 关于tag机场的使用问题' in en"
理由: The translation lacks context and clarity compared to the reference. | The hypothesis contains no translation, only a placeholder string. The content is also missing. | The hypothesis is not a proper translation but a placeholder indicating translation to English. It fails to provide the actual translated content. | Only a meta description; it does not actually translate the title about usage issues of the TAG proxy provider/airport. |
| tencent/HY-MT1.5-7B | Title: Questions about Using the tag 'Airport' | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【OpenClaw Model Rankings】What model is suitable for raising crabs? |
致命
[准确性]
"What model is suitable for raising crabs?"
理由: Mistranslation of '养龙虾' (raising lobster/OpenClaw). Should be 'running OpenClaw' not 'raising crabs'. '龙虾' is a slang term for OpenClaw in this context. | Mistranslation of slang '养龙虾' (raising lobsters/OpenClaw). The hypothesis translates '龙虾' as 'crabs' and interprets the slang literally as animal husbandry, whereas the context refers to running the OpenClaw software. | Misinterprets metaphorical “养龙虾” (run/use OpenClaw) as literally raising crabs and changes species; should refer to using/running OpenClaw, not crabs. |
| CohereLabs/tiny-aya-water | Title: 【OpenClaw Model Rankings】What model is suitable for raising crabs? |
致命
[准确性]
"What model is suitable for raising crabs?"
理由: Mistranslation of '养龙虾' (raising lobsters/running OpenClaw). Should be 'running OpenClaw' not 'raising crabs'. The source uses 龙虾 (lobster) as a metaphor for OpenClaw, not crabs. | Mistranslation of slang '养龙虾' (raising lobsters/OpenClaw). The hypothesis translates '龙虾' as 'crabs' and interprets the slang literally as animal husbandry, whereas the context refers to running the OpenClaw software. | Mistranslates "养龙虾" which here metaphorically refers to running OpenClaw; also "crabs" is the wrong animal term. |
| Qwen/Qwen2.5-14B-Instruct | Title: [OpenClaw Model Ranking] Which model is suitable for raising lobsters? Content: This morning I saw the founder of OpenClaw share an open-source project's benchmark test on openlaw... |
严重
[准确性]
"openlaw"
理由: Incorrectly transcribed 'openclaw' as 'openlaw', changing the name of the project. | Hypothesis says 'openlaw benchmark test' but should be 'OpenClaw benchmark test' - misidentifies the project name | Misrecognition of product name; should remain OpenClaw benchmark, not openlaw. |
| Qwen/Qwen2.5-7B-Instruct | Title: [OpenClaw Model Ranking] What Model is Suitable for Keeping Lobsters? Content: This morning, I saw the founder of OpenClaw share an open-source project's openlaw benchmark test... |
严重
[准确性]
"What Model is Suitable for Keeping Lobsters?"
理由: Misinterprets metaphorical ‘养龙虾’ (running OpenClaw) as literally keeping lobsters. | Mistranslation of '养龙虾' - should refer to 'running OpenClaw' not 'keeping lobsters'. The reference correctly interprets this as 'Which model is suitable for running OpenClaw' | Failed to translate the slang '养龙虾' (raising lobsters) which refers to running the OpenClaw software. The translation interprets it literally as animal husbandry.
严重
[准确性]
"openlaw benchmark test"
理由: Should be 'OpenClaw benchmark test' not 'openlaw benchmark test'. The source mentions 'openclaw' and 'openlaw' appears to be a typo or misreading | Mistranslates ‘openclaw 基准测试’ as ‘openlaw benchmark test’ and omits that the benchmark is for OpenClaw. | Incorrect transcription of the project name 'OpenClaw' as 'openlaw'. |
| Qwen/Qwen3-14B | 【OpenClaw-Modell-Rankings】Welches Modell eignet sich zum Züchten von Krabben? |
严重
[准确性]
"Welches Modell eignet sich zum Züchten von Krabben?"
理由: ‘养龙虾’在上下文中指运行/使用 OpenClaw 模型,而非实际饲养螃蟹;将“龙虾”误译为“Krabben”、将软件语境转为真实养殖,语义偏离。 | Target language is zh (Chinese) but hypothesis is in German. Wrong target language used. | Mistranslation of slang '龙虾' (lobster/OpenClaw) as 'crabs'. The reference and context imply 'OpenClaw', not literal crustaceans. |
| Qwen/Qwen3.5-4B | Title: [OpenClaw Model Rankings] Which model is suitable for raising clawed lobsters? Content: This morning I saw the founder of OpenClaw share an open-source project's OpenLaw benchmark test... |
严重
[准确性]
"养龙虾 / raising clawed lobsters"
理由: Mistranslated as 'raising clawed lobsters' when it's a metaphor for 'running OpenClaw'. Should be 'running OpenClaw' not 'raising lobsters' | Misinterprets metaphorical ‘养龙虾’ which in context refers to running/using OpenClaw, not literally raising lobsters. | Mistranslation of slang '养龙虾' (raising lobsters) which refers to running/hosting the OpenClaw software, not actual animal husbandry.
严重
[准确性]
"openlaw"
理由: Incorrect entity name; should be 'OpenClaw' based on context and source 'openclaw'. | Incorrect translation: 'openlaw' should be 'OpenClaw' - the source says 'openclaw基准测试' (OpenClaw benchmark test), not OpenLaw. | Mistranscription of product name; should be OpenClaw benchmark, not OpenLaw. | Mistranslated as 'OpenLaw benchmark test' when source says 'openlaw' but context indicates 'OpenClaw benchmark test' |
| Qwen/Qwen3.5-9B | Title: [OpenClaw Model Rankings] What model is suitable for raising lobsters? Content: This morning I saw the OpenClaw founder share an open source project's OpenLaw benchmark test... |
轻微
[准确性]
"养龙虾 / raising lobsters"
理由: Literal translation of slang '养龙虾' (running OpenClaw) instead of the intended technical meaning, leading to nonsensical output. | Misinterprets the metaphorical use of “养龙虾” (running/using OpenClaw) as literal lobster raising; changes topic meaning. | Mistranslation of '养龙虾' (running/using OpenClaw) as 'raising lobsters'. Should be 'Which model is suitable for running OpenClaw?' | Mistranslation: 'raising lobsters' should be 'running OpenClaw' - this is a slang term for the software, not actual lobsters
严重
[准确性]
"openlaw / OpenLaw"
理由: Source says “openclaw” but hypothesis uses “openlaw/OpenLaw”, referring to a different project. | Incorrect entity name; should be 'OpenClaw' based on context and source 'openclaw'. | Incorrect term: 'openlaw' should be 'OpenLaw' (proper name) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | What Model is Suitable for Rearing Lobsters? |
轻微
[准确性]
"What Model is Suitable for Rearing Lobsters?"
理由: Mistranslates the product name OpenClaw as literal lobsters/rearing lobsters; should refer to running/using OpenClaw models, not animal husbandry. | The translation interprets the slang term '养龙虾' (raising lobsters, meaning running OpenClaw) literally as farming actual lobsters, completely missing the technical context established in the source and reference. | Missing context marker '[OpenClaw Model Ranking]' and mistranslates '养龙虾' (running OpenClaw) as 'rearing lobsters' instead of the technical meaning | The translation does not mention 'OpenClaw' which is a key term in the reference. |
| google/gemma-3-12b-it | Title: [OpenClaw Model Ranking] Which model is suitable for raising lobsters? |
致命
[准确性]
"Which model is suitable for raising lobsters?"
理由: Incorrect translation of '养龙虾' as 'raising lobsters' instead of 'running OpenClaw' | The source term '养龙虾' is slang for running the 'OpenClaw' software. The hypothesis translates it literally as raising actual lobsters, completely missing the context and meaning. | Misinterprets the metaphorical '养龙虾' (running OpenClaw) literally as raising lobsters instead of running OpenClaw. | Mistranslation of '养龙虾' (slang for running/deploying OpenClaw). Should be 'running OpenClaw' not 'raising lobsters' | Mistranslation: '养龙虾' is slang for 'running OpenClaw', not literal raising of lobsters. Should be 'running OpenClaw'. |
| google/gemma-3-1b-it | What models are suitable for raising lobsters? |
严重
[准确性]
"What models are suitable for raising lobsters?"
理由: Completely misses the context. '龙虾' is a slang term for OpenClaw model, not literal lobsters. Should translate to something about OpenClaw model ranking. | The translation does not match the reference's context, which refers to running OpenClaw. | Mistranslates key term OpenClaw as literal lobsters and omits model ranking / OpenClaw context from title and benchmark test from content. | The hypothesis translates the slang term '养龙虾' (raising lobsters, referring to running OpenClaw) literally as raising actual lobsters, completely missing the technical context established in the source and reference. | Incorrect translation: 'lobsters' should be 'OpenClaw' (the slang term for OpenClaw). Also '养' (raise/keep) should be 'run/use' not 'raise'. |
| google/gemma-3-4b-it | 【OpenClaw Lobster Model Ranking】Which model is suitable for raising lobsters? |
严重
[准确性]
"raising lobsters"
理由: Mistranslation of '养龙虾' (running/maintaining OpenClaw) as literally raising animals. | Should be 'Which model is suitable for running OpenClaw' - the reference shows this is about OpenClaw models, not actual lobsters | Misinterprets metaphorical ‘养龙虾’ which here means running/using OpenClaw, not literally raising lobsters. |
| google/translategemma-12b-it | Title: [OpenClaw Model Ranking] Which model is suitable for raising lobsters? |
严重
[准确性]
"Which model is suitable for raising lobsters?"
理由: Failed to recognize '养龙虾' (raising lobsters) as slang for running/deploying the 'OpenClaw' software. The translation interprets it literally, changing the meaning entirely. | The term 'lobsters' should be translated as 'OpenClaw' to maintain accuracy. | Mistranslation of '养龙虾' (slang for running/deploying OpenClaw). Should be 'running OpenClaw' not 'raising lobsters' | Misinterprets metaphorical “养龙虾” (running OpenClaw) literally as raising lobsters, changing the meaning. |
| google/translategemma-4b-it | "en": "Title: [OpenClaw Model Ranking] Which model is suitable for raising lobsters?", "zh-TW": "標題:【OpenClaw 模型排行】什麼模型適合養龍蝦?", "ja": "タイトル:[OpenClaw モデルランキング] 何のモデルがエビを飼育するのに適していますか?", "ko": "제목: [OpenClaw 모델 순위] 어떤 모델이 게를 키우는 데 적합한가?", "de": "Titel: \[OpenClaw Modell-Ranking] Welches Modell eignet sich zum Züchten von Krebs?", "fr": "Titre : \[Classement des modèles OpenClaw] Quel modèle est adapté à l'élevage de crabes ?", "it": "Titolo: \[Classifica modelli OpenClaw] Quale modello è adatto per allevare granchi ?" } <|im_end|> |
致命
[准确性]
"raising lobsters"
理由: Failed to translate the slang '养龙虾' (raising lobsters) which refers to running/hosting the OpenClaw model. The translation interprets it literally as farming crustaceans, completely missing the technical context. | Source uses '龙虾' (lobster) as slang for OpenClaw model. Hypothesis translates literally as 'raising lobsters' but reference correctly interprets as 'running OpenClaw' | Incorrect translation: 'raising lobsters' should be 'running OpenClaw' - the source uses '龙虾' (lobster) as slang for OpenClaw, not actual lobsters |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾' in English |
严重
[准确性]
"Translation of '龙虾' in English"
理由: Does not translate the title or content; just repeats a meta-instruction. Fails to render meaning, including the OpenClaw term. | Hypothesis is incomplete and does not provide an actual translation. '龙虾' (lobster) is a slang term for OpenClaw in this context, but the hypothesis fails to translate it or provide meaningful content. | The hypothesis is a meta-commentary describing a translation task rather than providing the actual translation of the source text. It fails to translate the title or content. | The hypothesis does not provide a complete translation of the source text. | Hypothesis is a literal translation of a single word 'lobster' instead of translating the full title and content. The source is about OpenClaw model ranking, not actual lobsters. |
| tencent/HY-MT1.5-7B | "en": "Title: [OpenClaw Model Ranking] Which model is suitable for raising lobsters?", "zh-TW": "Title: [OpenClaw Model Ranking] Which model is suitable for raising lobsters?", "ja": "Title: [OpenClaw Model Ranking] どのモデルがロブスターを飼育するのに適しているか?", "ko": "Title: [OpenClaw Model Ranking] 어떤 모델이 로브스터를 기르기에 적합한가?", "de":="Titel: [OpenClaw Model Ranking] Welcher Modell eignet sich zum Züchten von Lobstern?", "fr": "Titre: [OpenClaw Model Ranking] Quel modèle est adapté à l’élevage des homards?", "it": "Titolo: [OpenClaw Model Ranking] Quale modello è adatto all’allevamento dei gamberi di mare?" }|im_end|> |
致命
[准确性]
"Which model is suitable for raising lobsters?"
理由: Mistranslation of '养龙虾' (running/deploying OpenClaw). Reference shows it should be 'running OpenClaw', not 'raising lobsters'. '龙虾' is a slang term for OpenClaw in this context. | Failed to translate the slang term '养龙虾' (raising lobsters) which refers to running/deploying the 'OpenClaw' model. The translation interprets it literally as farming actual lobsters, completely missing the context established in the source and reference. | Misinterprets metaphorical ‘养龙虾’ (run OpenClaw) as literal raising lobsters; should be about running OpenClaw |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: The Simplest Way to Install Dragonfly |
致命
[准确性]
"Dragonfly"
理由: Mistranslation of the product name “大龙虾/OpenClaw”; “Dragonfly” is a different concept and breaks terminology consistency. | Incorrect translation of the slang term '大龙虾' (Big Lobster/OpenClaw). 'Dragonfly' is a completely different insect and loses the specific community reference to OpenClaw. | Mistranslation of '安装大龙虾' (install OpenClaw). Should be 'install OpenClaw' not 'install Dragonfly'. '龙虾' is slang for OpenClaw. |
| CohereLabs/tiny-aya-water | Title: The Simplest Way to Install Dragonfly |
致命
[准确性]
"Dragonfly"
理由: Incorrect translation of the slang term '大龙虾' (Big Lobster/OpenClaw). 'Dragonfly' is a completely different insect and loses the specific community reference to OpenClaw. | Mistranslation of '大龙虾' (OpenClaw). Should be 'OpenClaw' not 'Dragonfly'. The source uses 龙虾 (lobster) as a metaphor for OpenClaw. | Mistranslates "大龙虾" which refers to OpenClaw; "Dragonfly" is incorrect. |
| Qwen/Qwen2.5-14B-Instruct | Title: The simplest way to install the big lobster Content: The premise is having a live big lobster. Set up the new device for root passwordless login and let it replicate one. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: The Easiest Way to Install Big Lobster Content: First, you need a live big lobster. Set up the new machine as root SSH login to make it copy one over. |
严重
[准确性]
"Big Lobster"
理由: Failed to translate the slang '大龙虾' (Big Lobster) which refers to the OpenClaw software. Interpreted literally. | Should be 'The easiest way to install OpenClaw' not 'Big Lobster'. The slang term '大龙虾' refers to OpenClaw, not a literal lobster | Literal translation of ‘大龙虾’ instead of intended product name OpenClaw.
轻微
[准确性]
"root SSH login"
理由: Source says 'root 免密登录' (root passwordless login). Hypothesis says 'root SSH login', missing the crucial 'passwordless' aspect. | Adds “SSH” which is not in the source; source only mentions root passwordless login. | Incorrect translation: 'root SSH login' should be 'root passwordless login' as in the reference. | Incomplete and inaccurate translation. Should be 'Set the new machine to root passwordless login, and let it clone itself' - missing the concept of passwordless login and cloning |
| Qwen/Qwen3-14B | Sulla metodologia più semplice per l'installazione del granchio enorme |
严重
[准确性]
"Sulla metodologia più semplice per l'installazione del granchio enorme"
理由: Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | Complete failure to translate slang '大龙虾' (OpenClaw). Translated literally as 'huge crab' instead of the software name. | 原文是比喻说“安装大龙虾”指安装 OpenClaw,且“大龙虾”被项目方用作 OpenClaw 的绰号;翻译为真实的“巨大螃蟹”且完全丢失 OpenClaw 名称,误解核心含义。 |
| Qwen/Qwen3.5-4B | Title: The Easiest Way to Install Big Lobster Content: Prerequisite is having a live Big Lobster. Set up the new machine for root passwordless login and let it clone one. |
严重
[准确性]
"安装大龙虾 / Install Big Lobster"
理由: ‘大龙虾’ is a nickname for OpenClaw; translation treats it literally instead of as the software name. | Failed to translate the slang term '大龙虾' which refers to the software 'OpenClaw', interpreting it literally as the animal. | Mistranslated as 'Big Lobster' when it's a metaphor for 'OpenClaw'. Should be 'OpenClaw' not 'Big Lobster' |
| Qwen/Qwen3.5-9B | Title: The Easiest Way to Install Big Lobster Content: First, you need a live big lobster. Set up the new machine for root passwordless login so it can copy one out. |
轻微
[准确性]
"Install Big Lobster / live big lobster"
理由: Literal translation of slang '大龙虾' (OpenClaw) instead of the proper noun, causing loss of meaning. | Mistranslation - should be 'The easiest way to install OpenClaw', not 'Big Lobster' | Mistranslation: 'Big Lobster' should be 'OpenClaw' - this is a software name, not a literal big lobster | “大龙虾” is a slang/product name (OpenClaw), not a literal big lobster; misinterprets technical context as literal animal.
严重
[准确性]
"copy one out"
理由: Should be 'clone itself' not 'copy one out' | Inaccurate translation of '复制一个出来' in this context; 'clone itself' is more appropriate for software instances. | Should be “clone itself”; “copy one out” is vague about self-cloning of the instance. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | About the simplest way to install the largest shrimp |
轻微
[准确性]
"largest shrimp"
理由: The translation does not mention 'OpenClaw' which is a key term in the reference. | The translation fails to recognize '大龙虾' as slang for the software 'OpenClaw', translating it literally as a large crustacean, which renders the title nonsensical in the technical context. | Completely misses that '大龙虾' is a metaphor for OpenClaw; translates literally as 'largest shrimp' instead of 'OpenClaw' | Misinterprets 大龙虾, which in context is the product OpenClaw, not a literal large shrimp. |
| google/gemma-3-12b-it | Title: The Easiest Way to Install Big Lobster Content: The prerequisite is a live, large lobster. Set up the new machine as root login without password and let it copy one out. |
致命
[准确性]
"Big Lobster"
理由: Incorrect translation of '大龙虾' as 'big lobster' instead of 'OpenClaw' | Failed to recognize '大龙虾' as slang for the software 'OpenClaw', translating it literally instead. | Mistranslation of '大龙虾' (slang for OpenClaw). Should be 'OpenClaw' not 'Big Lobster' | '大龙虾' is slang for OpenClaw; should refer to OpenClaw, not a literal large lobster. | Mistranslation: '龙虾' is slang for OpenClaw software, not literal lobster. Should be 'OpenClaw'.
严重
[准确性]
"let it copy one out"
理由: '复制一个出来' in this context means 'clone itself' (another instance), not just 'copy one out' which is vague. | Inaccurate translation of '复制一个出来' in the context of software; 'clone' (as in reference) is the correct term. | Mistranslation of '复制一个出来'. Should be 'clone itself' not 'copy one out' |
| google/gemma-3-1b-it | The prerequisite is a live big dragon shrimp. |
轻微
[准确性]
"The prerequisite is a live big dragon shrimp."
理由: Incorrect translation: '大龙虾' is slang for OpenClaw, not literal 'big dragon shrimp'. Should be 'OpenClaw instance' or similar. | Mistranslates metaphorical ‘live lobster’ as literal ‘big dragon shrimp’ and omits mention of a running instance of OpenClaw and follow-up configuration steps. | Literal translation of slang '活的大龙虾' (live big lobster/OpenClaw instance). The hypothesis fails to recognize 'dragon shrimp' as a mistranslation of the slang referring to the software instance. | Misses the slang context entirely. '大龙虾' refers to OpenClaw, not literal shrimp. Should reference OpenClaw installation. |
| google/gemma-3-4b-it | Title: About the Easiest Way to Install a Large Lobster Content: Prerequisites include having a live large lobster. Set up the new machine for rootless passwordless login and let it copy one out. |
严重
[准确性]
"Title: About the Easiest Way to Install a Large Lobster"
理由: Misinterprets ‘大龙虾’ (OpenClaw) as a literal large lobster instead of the software/product name. | Should be 'install OpenClaw' not 'install a large lobster'. Lobster is a metaphor for OpenClaw | Failed to translate slang '大龙虾' to 'OpenClaw', interpreting it literally.
严重
[准确性]
"rootless passwordless login"
理由: Should be 'root passwordless' not 'rootless passwordless' | Should be 'root passwordless login' not 'rootless passwordless login'. The source says 'root免密登录' | Incorrect translation of 'root 免密' (root with no password) as 'rootless' (without root user). | Source/referent mean passwordless root login, not rootless; changes the technical meaning. |
| google/translategemma-12b-it | Title: The Easiest Way to Install Big Lobster Content: Prerequisites: You need a live, large lobster. Set up the new machine for root login without a password so that it can replicate itself. |
严重
[准确性]
"The Easiest Way to Install Big Lobster"
理由: Literal translation of '大龙虾' (Big Lobster), which is slang for the 'OpenClaw' software. The context implies installing software, not a biological organism. | The term 'Big Lobster' should be translated as 'OpenClaw' to maintain accuracy. | “大龙虾” here is a nickname for OpenClaw; translating literally as “Big Lobster” loses the intended product name meaning. | Mistranslation of '大龙虾' (slang for OpenClaw). Should be 'OpenClaw' not 'Big Lobster' |
| google/translategemma-4b-it | Title: About the easiest way to install a large lobster Content: Assuming you have a live lobster. Set up the new machine for rootless SSH login, so it can copy one out. |
严重
[准确性]
"Title: About the easiest way to install a large lobster"
理由: Incorrect translation of '龍蝦' as 'lobster' instead of 'OpenClaw' | Should be 'install OpenClaw' not 'install a large lobster'. '龙虾' is slang for OpenClaw | '大龙虾' here refers to OpenClaw, not a literal large lobster; product name is lost. | Literal translation of the slang '大龙虾' (Big Lobster/OpenClaw). In this context, it refers to the software, not the animal. | Incorrect translation: 'large lobster' should be 'OpenClaw' - '大龙虾' is slang for OpenClaw
严重
[准确性]
"Assuming you have a live lobster."
理由: Literal translation of '活的大龙虾' (live lobster), which refers to a running instance of the software. The meaning is lost. | Incorrect translation: 'live lobster' should be 'running instance of OpenClaw', 'rootless SSH login' should be 'root passwordless login', 'copy one out' should be 'clone itself' - these are technical terms mistranslated as literal meanings | Misses explanatory apposition indicating a running OpenClaw instance; reads as real animal not software instance.
严重
[准确性]
"rootless SSH login"
理由: Source says root passwordless login, not rootless SSH; changes meaning of configuration requirement. | Mistranslation of 'root 免密登录'. 'Root 免密' means 'root with no password' (passwordless root), whereas 'rootless' means 'without root privileges'. | Should be 'root passwordless login' not 'rootless SSH login'. These have opposite meanings |
| tencent/HY-MT1.5-1.8B | The simplest way to install a large lobster |
严重
[准确性]
"The simplest way to install a large lobster"
理由: The term 'large lobster' is a literal mistranslation of the slang '大龙虾' which refers to 'OpenClaw' in this context, as indicated by the reference. The meaning is lost. | Hypothesis incorrectly translates '大龙虾' as 'large lobster' when it refers to OpenClaw (a software). The reference correctly translates it as 'OpenClaw'. | Hypothesis translates '大龙虾' literally as 'large lobster' but should recognize '龙虾' as slang for OpenClaw. The correct translation should be 'The easiest way to install OpenClaw' as shown in reference. |
| tencent/HY-MT1.5-7B | Title: The Simplest Way to Install a Giant Lobster |
严重
[准确性]
"Giant Lobster"
理由: Mistranslation of '安装大龙虾'. Should be 'install OpenClaw', not 'install a giant lobster'. '龙虾' is slang for OpenClaw. | The translation refers to 'giant lobster' instead of 'OpenClaw'. | Misinterprets metaphor ‘大龙虾’ for OpenClaw; should refer to OpenClaw installation, not a literal lobster | Reference says 'OpenClaw' but hypothesis says 'Giant Lobster' - completely mistranslated the subject | Literal translation of the slang '大龙虾' (Big Lobster/OpenClaw). The translation suggests installing a crustacean rather than the software 'OpenClaw'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is there anyone who can explain the process of using GPT-5.4 for fish sharing carpooling? |
严重
[准确性]
"fish sharing carpooling"
理由: Literal mistranslation of slang. '咸鱼' (Xianyu) is a second-hand trading platform, and '拼车' (carpooling) refers to sharing subscription costs. 'Fish sharing carpooling' is nonsensical in this context. | Awkward phrasing. '咸鱼拼车' refers to sharing subscriptions on Xianyu platform. Should be 'sharing a GPT-5.4 subscription on Xianyu' not 'fish sharing carpooling'. | Misinterprets “咸鱼拼车” which refers to sharing a subscription via Xianyu; literal ‘fish sharing carpooling’ is incorrect and misleading. |
| CohereLabs/tiny-aya-water | Title: Is there anyone who can explain the process of using GPT-5.4 for fish sharing carpooling? |
严重
[准确性]
"fish sharing carpooling"
理由: Misinterprets "咸鱼拼车" which refers to shared subscription on Xianyu, not carpooling with fish. | Literal mistranslation of slang. '咸鱼' (Xianyu) is a second-hand trading platform, not 'fish'. '拼车' in this context means sharing a subscription cost, not physical 'carpooling'. | Awkward phrasing. Should be 'sharing a GPT-5.4 subscription (carpooling) on Xianyu' or 'subscription sharing on Xianyu'. The term '咸鱼拼车' refers to sharing subscriptions on the Xianyu platform, not 'fish sharing carpooling'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Can someone explain the process of using gpt5.4 for carpooling on Xianyu? Content: I've seen that carpooling on Xianyu is quite popular recently, but I don't know how to try it out... Are there any reliable steps? |
严重
[准确性]
"carpooling on Xianyu"
理由: Here ‘拼车’ is metaphor for subscription sharing; needs to indicate sharing a GPT-5.4 subscription, not literal carpooling. | Failed to contextualize the slang '拼车' (carpooling) which here refers to sharing a subscription cost for AI services, not actual vehicle transport. The reference clarifies this as 'sharing a GPT-5.4 subscription'. | Misinterpretation of '咸鱼拼车' as 'carpooling on Xianyu' instead of 'shared subscriptions' |
| Qwen/Qwen2.5-7B-Instruct | Title: Does Anyone Know the Process of Carpooling on Xianyu Using GPT-5.4? Content: Recently, I've seen carpooling on Xianyu become quite popular, but I'm not sure how to give it a try... Are there any safe steps? |
严重
[术语]
"Carpooling on Xianyu Using GPT-5.4"
理由: Incorrect translation: 'carpooling' should be 'sharing a subscription' (carpooling is a metaphor for shared accounts). The reference correctly uses 'sharing a GPT-5.4 subscription (carpooling)'. | While 'carpooling' is the literal translation of '拼车', in this context it refers to 'subscription sharing'. Without context or clarification, it may be confusing, though acceptable as a transliteration of the slang. | Incorrect translation of '咸鱼拼车' which refers to sharing subscriptions, not carpooling. | Should be 'sharing a GPT-5.4 subscription (carpooling)' - the term 'carpooling' is used metaphorically for subscription sharing, not literal carpooling |
| Qwen/Qwen3-14B | Qualcuno può spiegare il processo per utilizzare GPT5.4 per il carpooling su Xianyu? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Can someone explain the process of using GPT-5.4 for carpooling on Xianyu? Content: Recently I've seen that Xianyu carpooling is quite popular, but I don't know how to try it... Are there any reliable steps? |
轻微
[术语]
"carpooling on Xianyu"
理由: The term 'Xianyu carpooling' is inconsistent with the reference which refers to 'shared subscriptions'. | Incorrect translation of '咸鱼拼车' - should be 'sharing a GPT-5.4 subscription (carpooling)' not 'using GPT-5.4 for carpooling'. The source refers to sharing/joining a subscription, not using GPT for carpooling. | Should be 'sharing a subscription' or 'subscription sharing' rather than 'carpooling on Xianyu' - carpooling is less precise |
| Qwen/Qwen3.5-9B | Title: Can anyone explain the process for using GPT-5.4 to carpool on Xianyu? Content: Recently noticed that carpooling on Xianyu is quite popular, but I don't know how to try it... Are there any reliable steps? |
严重
[准确性]
"carpool on Xianyu"
理由: Incorrect translation: 'carpool' should be 'shared subscription' or 'carpooling' in the context of sharing accounts - the slang refers to shared subscription services, not literal carpooling | Missing context - should clarify this is about sharing/carpooling a GPT subscription, not using GPT to carpool | “拼车” here is metaphor for subscription sharing, not literal carpooling; mistranslation of the key concept. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Some experts explain how to use GPT-5.4 for saltwater fishing carpooling. |
严重
[准确性]
"saltwater fishing carpooling"
理由: The translation incorrectly interprets 'carpooling' as related to transportation instead of shared subscriptions. | The translation misinterprets '咸鱼' (Xianyu, a second-hand trading platform) as 'saltwater fish' and '拼车' (carpooling/sharing costs) as literal carpooling for fishing, failing to capture the context of sharing AI subscription costs. | Mistranslates '咸鱼拼车' (Xianyu subscription sharing) as 'saltwater fishing carpooling'; misses that this is about sharing AI subscriptions on a platform | Misinterprets 咸鱼拼车, which refers to sharing a subscription via Xianyu marketplace, not saltwater fishing. |
| google/gemma-3-12b-it | Title: Can someone explain the process of using GPT-5.4 for Xianyu carpooling? Content: I recently saw that Xianyu carpooling is quite popular, but I don’t know how to try it... Are there any reliable steps? |
严重
[准确性]
"Xianyu carpooling"
理由: Mistranslation: '咸鱼拼车' is slang for sharing subscriptions (like carpooling), not literal carpooling. Should be 'Xianyu shared subscription' or 'Xianyu subscription sharing'. | '拼车' here is metaphor for sharing a GPT-5.4 subscription on Xianyu, not literal carpooling. | Imprecise translation. Should be 'sharing a GPT-5.4 subscription (carpooling) on Xianyu' to clarify the context | While 'carpooling' is the literal meaning of '拼车', in this context it refers to sharing subscription costs. Without clarification (like 'sharing a subscription'), it may be confusing to an international audience, though less severe than the lobster errors. |
| google/gemma-3-1b-it | What is the workflow for using GPT-4 for fishing car rides? |
严重
[准确性]
"What is the workflow for using GPT-4 for fishing car rides?"
理由: Misunderstands ‘咸鱼拼车’ as ‘fishing car rides’ and downgrades GPT-5.4 to GPT-4, losing marketplace and subscription-sharing meaning. | Completely incorrect. '咸鱼拼车' refers to sharing subscriptions on Xianyu platform, not fishing. '鱼' is misinterpreted as fishing instead of Xianyu (咸鱼). | The translation changes the context from sharing subscriptions to fishing. | Multiple errors: '咸鱼拼车' is a specific term meaning 'shared subscription on Xianyu (二手交易平台)', not 'fishing car rides'. 'GPT5.4' was mistranslated as 'GPT-4'. | Misinterprets '咸鱼' (Xianyu, a second-hand trading platform) as 'fishing' and '拼车' (carpooling/sharing subscription) as literal 'car rides'. The entire meaning is lost. |
| google/gemma-3-4b-it | Title: Does anyone have any tips on using GPT 5.4 for Hitchhiking on Xianyu? Content: I recently saw that hitchhiking on Xianyu is quite popular, but I don’t know how to try it... Are there any safe steps? |
严重
[准确性]
"Hitchhiking on Xianyu"
理由: ‘拼车’ here refers to subscription sharing/group-buying, not hitchhiking; changes the scenario and meaning. | Literal translation of '拼车' (carpooling/sharing subscription) as 'hitchhiking', losing the specific context of account sharing. | Should be 'sharing a subscription' or 'carpooling' not 'hitchhiking'. The reference clarifies this is about subscription sharing |
| google/translategemma-12b-it | Title: Can someone explain the process of using GPT-5.4 for Xianyu carpooling? |
轻微
[准确性]
"Xianyu carpooling"
理由: The term 'Xianyu carpooling' should be translated as 'sharing a GPT-5.4 subscription' to maintain accuracy. | Literal translation of '拼车' (carpooling). In this context, it refers to 'subscription sharing' or 'group buying' of AI services. Without explanation, 'carpooling' is confusing for an English reader. | Incomplete translation. Should clarify this refers to 'sharing a GPT-5.4 subscription on Xianyu' not just 'carpooling' |
| google/translategemma-4b-it | Title: Does anyone have experience using GPT-5.4 for Carpooling on Xianyu? |
严重
[准确性]
"Have experience using"
理由: Source asks for someone to explain the process/steps, not for general experience; procedural nuance is missing. | Incorrect translation: 'Carpooling' should be 'shared subscription' - '拼车' in this context means sharing subscription costs, not carpooling | The source asks for an expert to 'explain the process' (给科普下...流程), while the hypothesis asks if anyone has 'experience'. The intent is slightly shifted from requesting a guide to requesting anecdotal experience.
严重
[准确性]
"using GPT-5.4 for Carpooling on Xianyu"
理由: Literal translation of '拼车' (carpooling). In this internet slang context, it refers to 'subscription sharing' or 'group buying', not actual vehicle carpooling. | Partial accuracy; should specify 'sharing a GPT-5.4 subscription' | Incomplete translation. Should clarify this is about sharing/carpooling a GPT-5.4 subscription, not using GPT-5.4 for carpooling service |
| tencent/HY-MT1.5-1.8B | Translation of '有大佬给科普下咸鱼拼车用gpt5.4的流程吗' in en |
严重
[准确性]
"Translation of '有大佬给科普下咸鱼拼车用gpt5.4的流程吗' in en"
理由: Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | The hypothesis does not provide a complete translation of the source text. | No actual translation is provided, only a meta-description. Meaning of the source sentence is missing. | The hypothesis is a meta-description of the task, not the translation itself. No content from the source is translated. | Hypothesis is a meta-description rather than an actual translation. It should provide the actual English translation like the reference does. |
| tencent/HY-MT1.5-7B | "en": "Title: Does anyone have experience sharing how to use GPT-5.4 for carpooling on Xianyu?", "zh-TW": "Title: Does anyone have experience sharing how to use GPT-5.4 for carpooling on Xianyu?", "ja": "Title: 有誰有經驗分享如何使用GPT-5.4進行Xianyu上的拼車嗎?", "ko": "Title: 혹시 GPT-5.4를 사용한 Xianyu 카풀링 경험을 공유해주실 분 있나요?", "de":="Titel: Hat jemand Erfahrung damit, wie man GPT-5.4 für Carpooling auf Xianyu verwendet?", "fr": "Titre : Y a-t-il quelqu’un qui puisse partager son expérience de l’utilisation de GPT-5.4 pour le carpooling sur Xianyu ?", "it": "Titolo: C’è qualcuno che può condividere la propria esperienza nell’utilizzare GPT-5.4 per il carpooling su Xianyu?"} |
严重
[准确性]
"sharing how to use GPT-5.4 for carpooling on Xianyu"
理由: Misinterprets the slang '拼车' (carpooling) which in this context means 'sharing a subscription cost'. The translation implies physical ride-sharing, which is nonsensical with 'GPT-5.4'. | Source asks for an expert to explain the process/steps of shared subscription (‘拼车’) using GPT‑5.4 on Xianyu, not just generic experience sharing | Inaccurate translation. Should be 'Can an expert explain the process of sharing a GPT-5.4 subscription (carpooling) on Xianyu?' The hypothesis misses the key meaning of asking for expert explanation of the process. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: User experience of connecting users to channels involving Dragon Crab |
轻微
[准确性]
"connecting users to channels involving Dragon Crab"
理由: Incorrect translation of the slang '龙虾' (Lobster/OpenClaw). 'Dragon Crab' is a hallucinated term that fails to capture the specific software reference. | Incorrect translation of '接入channels' - should be 'integrating with channels' | Mistranslation of '龙虾' (OpenClaw). Should be 'OpenClaw' not 'Dragon Crab'. '龙虾' is slang for OpenClaw. | Mistranslates “龙虾” which is a nickname for OpenClaw as “Dragon Crab” and adds “connecting users to channels” instead of OpenClaw integrating with channels. |
| CohereLabs/tiny-aya-water | Title: User experience of connecting to channels with Dragon Crab |
致命
[准确性]
"Dragon Crab"
理由: Incorrect translation of the slang '龙虾' (Lobster/OpenClaw). 'Dragon Crab' is a hallucinated term that fails to convey the specific software reference. | Mistranslation of '龙虾' (OpenClaw). Should be 'OpenClaw' not 'Dragon Crab'. The source uses 龙虾 (lobster) as a metaphor for OpenClaw. | Wrong rendering of "龙虾" which refers to OpenClaw; "Dragon Crab" is incorrect. |
| Qwen/Qwen2.5-14B-Instruct | Sull'esperienza utente dell'integrazione del gambero nei canali |
致命
[准确性]
"gambero"
理由: Wrong language: Target language is Chinese (zh) but hypothesis is in Italian | The hypothesis is in Italian, but the target language is Chinese (zh). Additionally, it translates '龙虾' literally as 'gambero' (shrimp/lobster) instead of the software name 'OpenClaw'. | Hypothesis is in Italian instead of English - completely wrong target language | ‘龙虾’ is OpenClaw nickname; should not be translated literally as ‘shrimp/gambero’ in this software context. |
| Qwen/Qwen2.5-7B-Instruct | Title: Discussion on Lobster's Channel Integration User Experience Content: Since there are many tools involved in Lobster, sometimes you might have to wait quite a while for a reply after sending a message... |
严重
[准确性]
"Lobster's Channel Integration"
理由: Uses literal ‘Lobster’ instead of product name OpenClaw. | Should be 'Discussing the user experience of OpenClaw integrating with channels' - '龙虾' is slang for OpenClaw, not literal lobster | Failed to translate '龙虾' (Lobster) as the software name 'OpenClaw'. Interpreted literally. |
| Qwen/Qwen3-14B | Esperienza utente nell'integrazione del granchio con i canali |
严重
[准确性]
"Esperienza utente nell'integrazione del granchio con i canali"
理由: Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | Mistranslation of slang '龙虾' (OpenClaw) as 'crab'. Should be 'OpenClaw'. | “龙虾”是 OpenClaw 的昵称,语境是软件集成 channels 的体验;译为“granchio”成真实螃蟹,且省略 OpenClaw 名称,偏离技术语境。 |
| Qwen/Qwen3.5-4B | Title: User Experience of Lobster's Access to Channels Content: Because there are many tool calls in the middle of Lobster, sometimes after sending a message, it might take quite a while before you receive a reply... |
严重
[准确性]
"龙虾 / Lobster"
理由: Failed to identify '龙虾' as the slang name for 'OpenClaw', translating it literally as the animal. | ‘龙虾’ is a community nickname for OpenClaw; literal ‘Lobster’ loses the software reference. | Should be 'OpenClaw' not 'Lobster'. 'Lobster' is a metaphor that should be translated to the actual product name |
| Qwen/Qwen3.5-9B | Title: On the User Experience of Integrating Lobster into Channels Content: Because Lobster involves many tool calls, sometimes after sending a message, you might have to wait quite a while before receiving a reply... |
轻微
[准确性]
"Integrating Lobster into Channels"
理由: “龙虾” is OpenClaw; treating it as literal/brand ‘Lobster’ instead of OpenClaw shifts meaning away from the product name. | Should be 'OpenClaw' not 'Lobster' in formal context | Mistranslation: 'Lobster' should be 'OpenClaw' - this is a software name used as slang, not literal lobster | Literal translation of slang '龙虾' (OpenClaw) leads to confusion. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: The User Experience of Dragon Shrimp Integration Channels |
严重
[准确性]
"Dragon Shrimp"
理由: The term '龙虾' refers to the software 'OpenClaw'. Translating it as 'Dragon Shrimp' is a hallucinated literal translation that loses the specific entity reference. | 龙虾 here is OpenClaw, a product name, not a literal dragon shrimp. | Translates '龙虾' literally as 'dragon shrimp' instead of recognizing it as the OpenClaw metaphor |
| google/gemma-3-12b-it | Title: User Experience of Integrating Lobster with Channels Content: Because Lobster involves many tool calls, sometimes after sending a message, you may have to wait a long time for a response... |
致命
[准确性]
"Integrating Lobster with Channels"
理由: '龙虾' is slang for OpenClaw; should reference OpenClaw, not literal lobster. | Failed to translate the slang '龙虾' as 'OpenClaw', resulting in a literal and incorrect reference to the animal. | Incorrect translation of '龙虾' as 'Lobster' instead of 'OpenClaw' | Mistranslation of '龙虾' (slang for OpenClaw). Should be 'OpenClaw' not 'Lobster' | Mistranslation: '龙虾' is slang for OpenClaw, not literal lobster. Should be 'OpenClaw'. |
| google/gemma-3-1b-it | Because there are many tools called to the shrimp, sometimes after sending a message or a conversation, it might take a while to receive a reply... |
严重
[准确性]
"Because there are many tools called to the shrimp"
理由: Fails to recognize '龙虾' as OpenClaw slang. Should be 'Because OpenClaw makes many tool calls' not 'shrimp'. | Incorrect translation: '龙虾' is slang for OpenClaw, not literal 'shrimp'. Should be 'OpenClaw'. | Literal translation of '龙虾' (lobster/shrimp) instead of recognizing it as the software 'OpenClaw'. The phrasing 'called to the shrimp' is nonsensical in this context. | Mistranslates OpenClaw as ‘shrimp’ instead of the product name. |
| google/gemma-3-4b-it | Title: User Experience for Accessing Channels with Lobster Content: Because lobster involves many tool calls, sometimes it takes a long time to receive a response after sending a single conversation... |
严重
[准确性]
"User Experience for Accessing Channels with Lobster"
理由: Should be 'OpenClaw integrating with channels' not 'Accessing Channels with Lobster'. Lobster is a metaphor for OpenClaw | Failed to translate slang '龙虾' to 'OpenClaw'. | ‘龙虾’ refers to OpenClaw product, not literal lobster; ‘accessing channels’ is less accurate than ‘integrating with channels’. |
| google/translategemma-12b-it | Title: User Experience of Connecting to Channels via Longxia Content: Because Longxia involves many tool calls, sometimes after sending a message and starting a conversation, you may have to wait a long time to receive a reply... |
严重
[准确性]
"Longxia"
理由: Mistranslation of '龙虾' (slang for OpenClaw). Should be 'OpenClaw' not 'Longxia' | Transliterated '龙虾' (Lobster) as 'Longxia' but failed to identify it as the slang term for 'OpenClaw'. This renders the software name unrecognizable. | The term 'Longxia' should be translated as 'OpenClaw' to maintain accuracy. |
| google/translategemma-4b-it | Title: User Experience for Accessing Channels with Lobsters Content: Because Lobsters involve many tool calls, sometimes after sending one conversation, it might take quite a while to receive a response... |
严重
[准确性]
"Lobsters"
理由: Misinterpretation of '龍蝦' as 'Lobsters' instead of 'OpenClaw' | Metaphorical '龙虾' stands for OpenClaw product; literal 'Lobsters' loses software reference. | '龙虾' is slang for OpenClaw, not literal lobsters. Should be 'OpenClaw' not 'Lobsters' | Incorrect translation: 'Lobsters' should be 'OpenClaw' - '龙虾' is slang for OpenClaw | Literal translation of '龙虾' (lobsters). Refers to the OpenClaw software, not the animal. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: The hypothesis does not match the source or reference text. | Completely wrong meaning; '论龙虾接入channels的用户体验' is about user experience of OpenClaw integrating with channels, not 'Login'. | Hypothesis 'Login' is completely unrelated to the source content about OpenClaw user experience with channels. This appears to be an incorrect placeholder. | The hypothesis 'Login' is completely unrelated to the source text which discusses user experience and tool calls. It appears to be a hallucination or error. | Hypothesis is completely wrong - 'Login' has no relation to the source which discusses OpenClaw integrating with channels and user experience. The reference shows the correct translation. |
| tencent/HY-MT1.5-7B | "en": "Title: User Experience of Integrating Lobster into Channels", "zh-TW": "标题:将Lobster整合到渠道中的用户体验", "ja": "标题:Lobsterをチャネルに統合した際のユーザー体験", "ko": "제목: Lobster를 채널에 통합한 사용자 경험", "de":="Titel: Benutzererfahrung bei der Integration von Lobster in Kanäle", "fr":="Titre : Expérience utilisateur de l'intégration de Lobster dans les canaux", "it":="Titolo: Esperienza utente nell’integrazione di Lobster nei canali"} |
轻微
[准确性]
"Integrating Lobster into Channels"
理由: ‘龙虾’ here is a nickname for OpenClaw; translation uses literal “Lobster” instead of the product name OpenClaw | The translation refers to 'Lobster' instead of 'OpenClaw'. | Literal translation of '龙虾' (Lobster) instead of the intended software 'OpenClaw'. Changes the meaning from a technical integration to a biological absurdity. | Should be 'OpenClaw', not 'Lobster'. '龙虾' is slang for OpenClaw in this technical context. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: elysiver公益站 error issue Content: Each one today afternoon called the model and showed this error, what's this problem... |
严重
[流畅性]
"Each one today afternoon called the model and showed this error"
理由: Awkward and ungrammatical phrasing; unnatural subject ‘Each one’ and tense/word order issues. | Unnatural English translation - 'Each one today afternoon called' is grammatically incorrect | Unnatural English. Should be 'Bros, when calling the model this afternoon' or similar. 'Each one' is incorrect. |
| CohereLabs/tiny-aya-water | Title: elysiver Public Proxy Error Issue Content: Guys, today afternoon when I called the model, this error message appeared. What's the issue? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: eLySive Free Shared API Site Error Report Content: Everyone, I encountered this error when calling the model this afternoon... What's the issue? |
轻微
[术语]
"Free Shared API Site"
理由: Source/usage implies ‘公益站’ as ‘free API proxy site’; ‘free shared API site’ is understandable but less standard for this community term. | Incorrect translation of '公益站' as 'Free Shared API' instead of 'free API proxy site' | '公益站' is better translated as 'free API proxy site' or similar in this tech context, though 'Free Shared' is acceptable. 'Error Report' for '报错问题' is slightly off; 'Error issue' is more accurate. |
| Qwen/Qwen2.5-7B-Instruct | Title: elysiver Free Shared API Site Error Issues Content: Guys today's afternoon when calling the model, this error appeared. What is this issue... |
轻微
[术语]
"elysiver Free Shared API Site"
理由: Should be 'Error issue with Elysiver free API proxy site' - more precise terminology | Slightly off translation of '公益站' which refers to a free API proxy site. | Non-standard wording compared to ‘free API proxy site’; ‘shared API site’ is less clear but understandable. |
| Qwen/Qwen3-14B | Title: Elysiver Free Shared API Site Error Problem Content: Everyone, this afternoon when calling the model, it showed this error. What's the problem here... | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: elysiver free shared API site error report Content: Hello everyone, today afternoon when calling the model, this error appeared. What is this problem... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Elysiver Free Shared API Site Error Report Content: Hello everyone, this afternoon when calling the model I got this error message. What's going on with this... |
轻微
[术语]
"Elysiver Free Shared API Site Error Report"
理由: Should be 'API proxy site' instead of 'site' as it refers to the specific service. | Over-translation: 'Free Shared API Site' is more verbose than reference 'free API proxy site' | Should be 'Elysiver free API proxy site' for consistency with reference |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Elysiver Public Proxy Error | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: elyisver Public Proxy Error Issues\nContent: Hi everyone, I encountered this error when calling the model this afternoon. What is the problem? |
严重
[准确性]
"elyisver"
理由: Typo: Should be 'elysiver' not 'elyisver' | Misspelling of 'elysiver', which may hinder correct identification of the service. | Misspelling of source term 'elysiver' - should be 'Elysiver' or 'elysiver' not 'elyisver'
轻微
[术语]
"Public Proxy"
理由: Incorrect translation of 'elysiver公益站' as 'elyisver Public Proxy' instead of 'Elysiver free API proxy site' | Could be more accurate: 'free API proxy' or 'free API site' instead of 'public proxy' | '公益站' is closer to 'free/public service site' or 'free API proxy site'; 'Public Proxy' is vague and omits 'free' / 'API' nuance. |
| google/gemma-3-1b-it | Elysver公益站报错问题 translation |
致命
[准确性]
"Elysver公益站报错问题 translation"
理由: The translation does not provide the English title as required. | Not a translation but a meta phrase; fails to translate title or content and omits error context completely. | The hypothesis includes the word 'translation' and fails to translate the content, acting as a meta-comment rather than a translation. | Not a translation at all. Appears to be a placeholder or incomplete response. Should translate to 'Error issue with Elysiver free API proxy site'. | Hypothesis is not a proper translation - it appears to be a placeholder or incomplete translation. Also 'Elysver' should be 'Elysiver'. |
| google/gemma-3-4b-it | elysiver free shared API site error issue | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Error with elysiver public API site Content: Some users are experiencing this error when calling the model this afternoon. What is the problem? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: elysiver free API site error issue Content: Hello everyone, today in the afternoon when calling the model, this error appeared, what is the problem... | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'elysiver公益站报错问题' in en |
严重
[准确性]
"Translation of 'elysiver公益站报错问题' in en"
理由: Hypothesis is a meta-description rather than an actual translation. Should provide the actual English translation. | The hypothesis does not provide a complete translation of the source text. | Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | Meta-instruction instead of translating; does not convey 'Error issue with Elysiver free API proxy site'. |
| tencent/HY-MT1.5-7B | Elysiver Public Welfare Site Error Report |
严重
[术语]
"Elysiver Public Welfare Site"
理由: ‘公益站’ in this context is a ‘free API proxy site’; “Public Welfare Site” misses the technical proxy sense | Literal translation of '公益站'. In this tech context, it refers to 'free API proxy sites' or 'community-hosted endpoints', not a charitable organization. | Slang term '公益站' (public welfare site) is a metaphor for free API proxy site, should be translated as 'free API proxy site' to preserve meaning |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: Asking about the configuration process for using openclaw. |
严重
[准确性]
"Asking about the configuration process for using openclaw."
理由: Overly generic; omits key details about configuring multiple free API proxy base URLs/keys and usage in Codex desktop and OpenClaw, losing important information. | Missing the title from the source. | Severe omission and summarization. The hypothesis completely drops the title and reduces the specific two-part question in the content to a vague summary, losing the specific contexts (Codex desktop, OpenClaw). |
| CohereLabs/tiny-aya-water | Content: Asking about how to get started with using openclaw. |
严重
[准确性]
"Content: Asking about how to get started with using openclaw."
理由: Overly generic; omits key details about configuring base URLs and keys in different clients, losing main informational content. | Severe omission and distortion. The hypothesis replaces the specific questions about configuring Codex and OpenClaw with a vague summary, losing the core information of the source text. | Incomplete and inaccurate translation. The hypothesis only provides content without the title, and the content is a vague summary rather than a proper translation of the source which asks about configuring base URLs and keys for free API proxy sites. | Content translation is incomplete - missing specific questions about Codex desktop client and OpenClaw configuration. | Missing translation of the title. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Newbie Help] I see many free API base URLs and keys in the site, how do I configure and use them? Content: Seeking clarification 1. How to configure and use in Codex desktop end... 2. How to configure and use in OpenClaw. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Newbie Help] Many free API sites and keys can be seen on the station, how to configure and use them? Content: Seeking answers 1. How to configure and use on Codex desktop end...2. How to configure and use on OpenClaw? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | 【Aiuto per principianti】Ho visto molti URL base e chiavi delle API gratuite sul sito, come si configurano? |
轻微
[准确性]
"Ho visto molti URL base e chiavi delle API gratuite sul sito"
理由: 原文指“站里很多公益站的base url和key”,即多个公益中转站的 base URL 和 key;译文泛化为“URL base e chiavi delle API gratuite sul sito”,弱化了“公益站/中转站”的特定含义,但核心意思基本可理解。 | Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | Slight nuance loss. '公益站' implies 'free proxy sites' (often community-run), not just generic 'free APIs'. 'Free API proxy sites' is more accurate. |
| Qwen/Qwen3.5-4B | Title: [Newbie Help] I noticed many free shared API sites and public proxies on this platform with their base URLs and keys. How do I configure them? Content: Seeking clarification 1. How to configure usage on Codex desktop... 2. How to configure usage on OpenClaw. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Newbie Help] I see many public proxy base URLs and keys on this site; how do I configure and use them? Content: Seeking clarification: 1. How to configure and use on Codex desktop client... 2. How to configure and use on OpenClaw? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: New User Assistance - How to configure using base URLs and keys from many public proxy sites. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [Newbie Help] I see many free API sites’ base URLs and keys in the forum, how do I configure and use them? Content: Seeking clarification 1. How to configure and use on Codex desktop client? 2. How to configure and use on OpenClaw? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Solve the problem: How to configure and use the base URL and key of many free API sites in the Codex desktop application? | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: [Newbie Help] I see many public proxy base URLs and keys on the website, how do I configure them? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Help for Beginners] I see many public API sites with their base URLs and keys on this site. How do I configure and use them? Content: Seeking clarification: 1. How do I configure and use them on the Codex desktop client? 2. How do I configure and use them on OpenClaw? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Newbie Help] I see many free shared API sites (公益站) base URLs and keys on the platform. How do I configure them? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 【新手求助】看到站里很多公益站的base url和key,如何配置使用呢。' in en |
严重
[准确性]
"Translation of 'Title: 【新手求助】看到站里很多公益站的base url和key,如何配置使用呢。' in en"
理由: Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | The hypothesis does not provide a complete translation of the source text. | Returns only a meta-label; fails to translate the actual Chinese title. | The hypothesis is a meta-description of the task rather than the actual translation. It provides no useful translated content. | Hypothesis is a meta-description rather than an actual translation. Should provide the actual English translation. |
| tencent/HY-MT1.5-7B | "en": "Title: [New User Help] Seeing many base URLs and keys for public welfare sites on the platform, how can I configure and use them?", "zh-TW": "标题:[新手帮助] 在平台上看到许多公益站的基址URL和密钥,该如何配置和使用它们呢?", "ja": "タイトル:[初心者向けヘルプ] プラットフォーム上で多くの公益サイトのベースURLとキーを見かけましたが、どのように設定して使用すればいいですか?", "ko": "제목:[신규 사용자 도움] 플랫폼에서 많은 공익사이트의 기본 URL과 키를 보았는데, 어떻게 설정하고 사용해야 할까요?」|im_end|> |
严重
[术语]
"public welfare sites"
理由: In this community ‘公益站’ refers to free API proxy sites; “public welfare sites” is literal and misses the proxy/API nuance | Slang term 'public welfare sites' (公益站) is a metaphor for free API proxy sites, should be translated as 'free API proxy sites' to preserve meaning | Literal translation of '公益站'. Should be translated as 'free API proxy sites' or similar to reflect the context of shared API keys. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Why is my shrimp installed with podman is not real dragon eye? |
严重
[准确性]
"shrimp"
理由: Inconsistent/Incorrect translation of '龙虾' (Lobster). While related, changing from lobster to shrimp loses the specific slang term consistency. | Mistranslation of “龙虾” which here refers to OpenClaw; ‘shrimp’ is semantically wrong and loses the product nickname. | Mistranslation of '龙虾怎么是真龙瞎' (Why is OpenClaw truly blind/useless). Should translate '龙虾' as 'OpenClaw' not 'shrimp'. Also '真龙瞎' means 'truly blind/useless', not 'real dragon eye'. |
| CohereLabs/tiny-aya-water | Content: What can I do? The tools don't work and even executing commands doesn't help. It's strange, I'm using 5.3 codex, but when using 5.4, it often prompts me that the supplier does not exist... | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Why is my podman-installed lobster actually blind dragon? Content: Can't do anything, tool execution doesn't work either, very strange, connected to 5.3 codex, using 5.4 often prompts supplier does not exist... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: How can I install true LinuxDo blind dragon using podman? Content: Can't do anything, even tool execution fails. Very strange, connected to 5.3codex, often gets supplier not found... |
致命
[准确性]
"true LinuxDo blind dragon"
理由: Misinterprets humorous ‘真龙瞎’ (really useless OpenClaw) into nonsensical phrase involving ‘LinuxDo’ and ‘blind dragon’. | Completely incorrect translation. Should be 'Why is the OpenClaw I installed with Podman truly "blind" (useless)?' - misunderstands the slang entirely | Complete hallucination and mistranslation. '真龙瞎' is a pun on 'OpenClaw' (真龙 vs 真瞎 - really blind/useless). The hypothesis invents 'LinuxDo blind dragon'. |
| Qwen/Qwen3-14B | Titolo: Perché il mio gambero installato con Podman è un vero drago cieco? |
严重
[准确性]
"Titolo: Perché il mio gambero installato con Podman è un vero drago cieco?"
理由: Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | Mistranslation of slang '龙虾' (OpenClaw) as 'shrimp/prawn'. Should be 'OpenClaw'. | 标题应表达“我用 Podman 装的龙虾(OpenClaw)怎么这么废/瞎”,这里“龙虾”是 OpenClaw 绰号;译文用“gambero”作真实虾并保留“drago cieco”字面梗,未显式体现 OpenClaw 软件含义,整体语义不清且误解昵称。 |
| Qwen/Qwen3.5-4B | I installed lobster using podman, but it's actually a real dragon blind. |
致命
[准确性]
"I installed lobster"
理由: Completely incorrect translation. The source says '龙虾' (OpenClaw) and '真龙瞎' (a pun meaning 'completely blind/useless'), not '真龙盲' (real dragon blind). The hypothesis completely mistranslated the meaning. | ‘龙虾’ refers to OpenClaw; literal ‘lobster’ loses the software meaning. | Incomplete translation. Missing content and mistranslates '龙虾' as 'lobster' when it should be 'OpenClaw'. Title should be 'Why is the OpenClaw I installed with Podman truly blind (useless)?' |
| Qwen/Qwen3.5-9B | "en": "Title: I installed Lobster using podman, but why is it actually a real Dragon Blind? Content: It can't do anything, tool execution doesn't work either, very strange. Connected to 5.3 codex, often prompts that the supplier does not exist when using 5.4...", "zh-TW": "標題:我用 Podman 安裝的龍瞎怎麼是真正的龍瞎?內容:什麼都做不了,工具執行也不行,很奇怪。連接的是 5.3 Codex,使用 5.4 時經常提示供應商不存在...", "ja": "タイトル:Podman でインストールした「龍瞎」がなぜ本当の「龍瞎」なのか?コンテンツ:何もできない、ツール実行も機能しない、非常に奇妙だ。5.3 Codex に接続しているが、5.4 を使うと頻繁に「サプライヤーが存在しない」というエラーが表示される...", "ko": "제목: Podman 으로 설치한 '용맹' 이 왜 진짜 '용맹' 인가요? 내용: 아무것도 할 수 없고, 도구 실행도 안 되네요. 매우 이상합니다. 5.3 codex 에 연결했는데, 5.4 를 사용하면 공급업체가 존재하지 않는다고 자주 경고가 뜨네요...", "de": "Titel: Ich habe mit Podman einen „Lobster" installiert, aber warum ist es eigentlich ein echter „Drachenblind"? Inhalt: Es kann nichts tun, die Toolausführung funktioniert auch nicht, sehr seltsam. Angeschlossen an 5.3 Codex, bei Verwendung von 5.4 wird häufig angezeigt, dass der Anbieter nicht existiert...", "fr": "Titre : J'ai installé un « lobster » avec podman, mais pourquoi est-ce réellement un vrai « dragon aveugle » ? Contenu : Il ne peut rien faire, l'exécution des outils ne fonctionne pas non plus, très étrange. Connecté au 5.3 codex, il indique souvent que le fournisseur n'existe pas lors de l'utilisation du 5.4...", "it": "Titolo: Ho installato un \"gambero\" con podman, ma perché è davvero un vero \"draco cieco\"? Contenuto: Non può fare nulla, l'esecuzione degli strumenti non funziona nemmeno, molto strano. Connesso al 5.3 codex, spesso segnala che il fornitore non esiste quando si usa il 5.4..." } |
严重
[准确性]
"real Dragon Blind / 龍瞎 / 龍瞎 (other langs)"
理由: Literal and incorrect translation of the pun '龙瞎' (slang for a useless/bad OpenClaw instance). Should be interpreted as 'useless' or kept as a transliterated term with explanation. | Mistranslation: 'Dragon Blind' should be 'blind' (useless) - the Chinese '龙瞎' is slang meaning useless/broken, not literal dragon | Misreads pun “真龙虾” (truly useless OpenClaw instance) as “真龙瞎/Dragon Blind”; introduces new, incorrect term. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | I used podman to install dragonfly and it's not working properly. |
严重
[准确性]
"dragonfly"
理由: Translates '龙虾' as 'dragonfly' instead of OpenClaw; misses the metaphor entirely and changes the meaning | Misreads 龙虾 which refers to OpenClaw product; translated as a different animal/software name (dragonfly), losing the product reference. | The source '龙虾' refers to 'OpenClaw'. The hypothesis translates it as 'dragonfly', which is a completely different insect and incorrect in this context. |
| google/gemma-3-12b-it | How can my lobster installed with podman be a real Long Xia? |
致命
[准确性]
"How can my lobster installed with podman be a real Long Xia?"
理由: Incorrect translation of '龙虾' as 'lobster' instead of 'OpenClaw' | Mistranslation of '龙虾' (slang for OpenClaw) and misinterpretation of '真龙瞎' (meaning useless/blind). Should be 'Why is the OpenClaw I installed with Podman truly useless?' | The hypothesis fails to interpret the pun/slang '真龙瞎' (a play on 'OpenClaw' sounding like 'Long Xia' meaning lobster, but implying it's useless/blind). It transliterates 'Long Xia' which makes no sense in English, whereas the reference explains the meaning ('truly blind/useless'). | Fails to convey complaint that OpenClaw installed with Podman is basically useless ('真龙瞎'); treats '龙虾' literally and leaves pun unexplained, losing core meaning. | Mistranslation: '龙虾' is slang for OpenClaw, not literal lobster. Should be 'OpenClaw'. |
| google/gemma-3-1b-it | I can't do anything, the tool execution is not allowed, it's strange, using 5.3codex, often gets a supplier not found error... |
严重
[准确性]
"I can't do anything, the tool execution is not allowed, it's strange, using 5.3codex, often gets a supplier not found error..."
理由: Incorrect translation: '龙虾' is slang for OpenClaw, not literal 'shrimp'. Should be 'OpenClaw'. Also '5.3codex' should be 'Codex 5.3'. | Partial translation of content only; omits title pun and installed-with-Podman context, and changes ‘provider does not exist’ to ‘supplier not found’. | Misses the slang '龙虾' (OpenClaw). Should reference OpenClaw specifically, not generic 'shrimp'. |
| google/gemma-3-4b-it | Title: How did I install lobster using podman and why is it a genuine LinuxDo? Content: Can’t do anything, even executing tools doesn’t work, very strange, connected to 5.3 codex, frequently prompts ‘supplier does not exist’ when using 5.4... |
严重
[准确性]
"why is it a genuine LinuxDo?"
理由: Completely mistranslated. Should be 'Why is the OpenClaw I installed with Podman truly blind (useless)'. The hypothesis text is incoherent and inaccurate | Severe hallucination/mistranslation. '真龙瞎' is a pun on 'OpenClaw' (真龙 vs 真龙虾,瞎 vs 瞎搞/useless). 'LinuxDo' is unrelated. | Incorrect translation - '真龙瞎' is internet slang meaning 'completely useless' or 'blind', not 'LinuxDo' | Misreads ‘真龙瞎’ (really blind/useless) as ‘genuine LinuxDo’; completely changes the intended pun/meaning. |
| google/translategemma-12b-it | The 'Longxia' I installed using podman is actually a real Longxia. |
轻微
[准确性]
"The 'Longxia' I installed using podman is actually a real Longxia."
理由: Complete failure to translate the slang pun. '真龙瞎' is a pun on '真龙虾' (Real Lobster/OpenClaw) implying it is useless/blind. The hypothesis translates it as 'real Longxia', losing the meaning that the software installation is broken/useless. | The term 'real Longxia' should be translated as 'useless' to maintain accuracy. | Completely mistranslated. Should be 'Why is the OpenClaw I installed with Podman truly blind (useless)?' The hypothesis is nonsensical and doesn't capture the meaning | Completely misses the pun “真龙瞎” meaning ‘truly blind/useless’; translation becomes nonsensical and drops the complaint about uselessness. |
| google/translategemma-4b-it | Title: How did I install 'DragonX' using Podman? Content: Can't do anything, even running tools doesn't work. It's very strange. I connected to Codex 5.3, and when using Codex 5.4, I get an 'supplier not found' error... |
严重
[准确性]
"How did I install 'DragonX' using Podman?"
理由: Incorrect translation: 'DragonX' should be 'OpenClaw' - '真龙瞎' is a transliteration/slang for OpenClaw, not a different name | Misrenders '龙虾' (OpenClaw) as 'DragonX', changing the product name and adding an unintended brand. | Severe mistranslation of the title. The source '怎么是真龙瞎' is a pun meaning 'Why is it truly useless/blind?', not a question about the installation method. 'DragonX' is an incorrect literal translation of the slang '龙虾' (OpenClaw). | Incorrect translation of '龍蝦' as 'DragonX' instead of 'OpenClaw' | Should be 'OpenClaw' not 'DragonX'. '龙虾' is slang for OpenClaw |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: Hypothesis translates '登录' (login) which does not appear in the source. The source is about OpenClaw installed with Podman being useless, not about login. | The hypothesis does not match the source or reference text. | Only repeats a meta-description and single word; does not translate the actual post title or content. | Hypothesis '登录' (login) is completely unrelated to the source content about Podman installation issues with OpenClaw. This is an incorrect placeholder. | The hypothesis translates the wrong source text ('登录' vs the actual title about Podman and OpenClaw) and acts as a meta-commentary. It completely ignores the provided source. |
| tencent/HY-MT1.5-7B | "en": "Title: Why is the lobster I installed with Podman actually a ‘real’ LinuxDo issue?", "zh-TW": "Title: 为什么我用Podman安装的龙虾实际上是一个‘真正的’LinuxDo问题?", "ja": "Title: 为什么我用Podman安装的龙虾实际上是一个‘真正的’LinuxDo问题?", "ko": "Title: 为什么我用Podman安装的龙虾实际上是一个‘真正的’LinuxDo问题?", "de":="Title: Warum ist die mit Podman installierte Lobster tatsächlich ein ‘echtes’ LinuxDo-Problem?", "fr": "Title: Pourquoi la lobster que j’ai installée avec Podman est-elle en fait un problème ‘vrai’ LinuxDo ?", "it": "Title: Perché la lobster che ho installato con Podman è in realtà un problema ‘vero’ LinuxDo?"}|im_end|> |
严重
[准确性]
"Lobster"
理由: Completely inaccurate translation. Should be 'Why is the OpenClaw I installed with Podman truly "blind" (useless)?' The hypothesis misinterprets '真龙瞎' as 'real LinuxDo issue' when it means 'truly blind/useless'. | Again misreads nickname for OpenClaw as literal lobster in all languages | Reference says 'blind' (useless) but hypothesis says 'LinuxDo issue' - completely misinterpreted the slang term '真龙瞎' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: The video I saw on Douyin showed how a shadowbanned account can use an auto model without being stealth nerfed. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: ChatGPT enables group chat with auto models that don't reduce intelligence | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Dopo aver attivato i gruppi di chat, ChatGPT può utilizzare il modello automatico non rimbambito |
致命
[准确性]
"modello automatico non rimbambito"
理由: Wrong language: Target language is Chinese (zh) but hypothesis is in Italian | The hypothesis is in Italian, but the target language is Chinese (zh). Also, the content section is completely missing. | Hypothesis is in Italian instead of English - completely wrong target language | Misses nuance of ‘不降智’ meaning ‘non-downgraded / not nerfed’ model; ‘non rimbambito’ sounds colloquial but does not clearly express model-tier restriction. |
| Qwen/Qwen2.5-7B-Instruct | Title: ChatGPT can use an unnerfed auto model when starting group chats Content: Saw a video on Douyin where a nerfed account could use an unnerfed basic chat model after starting group chats |
轻微
[术语]
"auto model"
理由: Less precise translation of 'non-shadowbanned/non-downgraded auto model'. | Keeps ‘auto’ untranslated; while used informally, ‘auto 模型’ or ‘auto model setting’ could be clearer, but still understandable. | Should be 'ChatGPT group chat allows using the non-shadowbanned/non-downgraded auto model' - more precise terminology |
| Qwen/Qwen3-14B | Title: Enable group chats with non-nerfed auto model on ChatGPT Content: Saw a video on Douyin that shadowbanned accounts can use the non-nerfed base chat model after enabling group chats | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: ChatGPT group chats can now use non-downgraded auto models Content: I saw a video on Douyin where an account that was shadowbanned could use basic chat models after enabling group chats. |
轻微
[术语]
"shadowbanned"
理由: '降智' literally means 'intelligence reduced' or 'downgraded'. While 'shadowbanned' is related, 'downgraded' is more accurate to the specific slang used for model capability reduction. | The term 'shadowbanned' is inconsistent with the source which uses '降智'. | Should be 'shadowbanned/downgraded' to capture both meanings of '降智' | The term '降智' (downgraded/dumbed down) is translated as 'shadowbanned' which is close but not exact. Reference uses 'shadowbanned/downgraded'. |
| Qwen/Qwen3.5-9B | Title: ChatGPT enables group chats with non-degraded auto model; Content: Saw a video on Douyin where an account shadowbanned can use the basic chat model without being nerfed. |
轻微
[术语]
"non-degraded auto model"
理由: '降智' is slang for 'shadowbanned' or 'downgraded' model performance; 'non-degraded' is slightly awkward but understandable. 'Shadowbanned' is more precise per reference. | Inconsistent: 'non-degraded' vs reference 'non-shadowbanned/non-downgraded' - reference is more accurate to the slang meaning | Should be 'non-shadowbanned/non-downgraded auto model' instead of 'auto model' as it refers to the specific service. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | chatgpt开启群聊可使用不降智的auto模型 | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Enable group chat with ChatGPT using an auto model that doesn't get nerfed. Content: I saw a video on Douyin where accounts that have been nerfed can use the basic chat model without getting nerfed after enabling group chat. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of 'chatgpt开启群聊可使用不降智的auto模型' in en |
致命
[准确性]
"Translation of 'chatgpt开启群聊可使用不降智的auto模型' in en"
理由: The translation does not provide the English title as required. | No actual translation is provided; output is just a meta description of a translation task. | The hypothesis is a meta-instruction describing the task rather than performing the translation. No actual translation is provided. | Not a translation. This is a placeholder/incomplete response. Should provide actual translation about ChatGPT group chat and non-shadowbanned models. | Hypothesis is not a proper translation - it appears to be a placeholder or incomplete translation. |
| google/gemma-3-4b-it | Title: ChatGPT group chats can use non-dumb auto models | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Enable group chat for ChatGPT using an auto model that doesn't get nerfed. Content: I saw a video on Douyin where accounts that have been nerfed can use the basic chat model without getting nerfed after enabling group chat. |
轻微
[术语]
"group chat for ChatGPT using an auto model that doesn't get nerfed"
理由: Incorrect translation of '降智' as 'nerfed' - should be 'dumbed down' or 'downgraded' or 'shadowbanned' | Less precise than “non-shadowbanned/non-downgraded auto model”; ‘nerfed’ is informal and may not cover shadowban nuance. | Informal terminology. Reference uses 'shadowbanned/non-downgraded' which is more precise |
| google/translategemma-4b-it | Title: ChatGPT group chats can use non-dumb auto models |
轻微
[准确性]
"can use"
理由: Omission of the condition that group chat allows downgraded accounts to use the normal auto model; lacks mention of shadowbanned/downgraded status. | The hypothesis omits the context from the content that this applies specifically to accounts that have been 'downgraded' (降智的号), which is crucial to the meaning. | Incorrect translation: 'non-dumb' should be 'non-shadowbanned/non-downgraded' - '降智' means being downgraded or shadowbanned, not dumb |
| tencent/HY-MT1.5-1.8B | en": "The title of this piece is 'chatgpt开启群聊可使用不降智的auto模型'", "zh-TW": "抖音上看到个视频,降智的号开启群聊后,可以使用不降智的基础聊天模型", "ja": "チャットGPTでグループチャットを開始すると、インテリジェンスを下げない自動モデルを使用できる", "ko": "틱톡에서 영상을 보고 알게 됐어요. 낮은 지능을 가진 사람이 그룹 채팅을 시작하면, 지능을 낮추지 않는 자동 모델을 사용할 수 있다고요.", "de": "Ich habe auf TikTok einen Video gesehen, der zeigt, dass man nachdem eine Person mit niedriger Intelligenz eine Gruppe Chat startet, ein automatisches Modell verwenden kann, das keine Intelligenz senkt", "fr": "J'ai vu un vidéo sur TikTok qui montre que lorsqu'une personne avec une faible intelligence commence un groupe de discussion, on peut utiliser un modèle automatique qui ne réduit pas l'intelligence", "it": "Ho visto un video su TikTok che mostra come, dopo aver avviato una chat di gruppo con un livello di intelligenza basso, è possibile utilizzare un modello automatico che non riduce l'intelligenza."} |
严重
[准确性]
"en": "The title of this piece is 'chatgpt开启群聊可使用不降智的auto模型'""
理由: Hypothesis is a meta-statement about the title rather than a translation. The reference provides the actual translated title and content. | The hypothesis does not provide a complete translation of the source text. | Hypothesis contains malformed JSON with multiple language translations instead of providing a single English translation. The format is incorrect and does not match the expected output. |
| tencent/HY-MT1.5-7B | "en": "Title: ChatGPT can be used in group chats without reducing its functionality.", "zh-TW": "标题:ChatGPT可在群聊中使用且功能不会减弱。, "ja": "标题:ChatGPT可以在群聊中使用,且功能不会降低。, "ko": "标题:ChatGPT可以在群聊中使用,且功能不会下降。, "de":="Titel: ChatGPT kann in Gruppenchats verwendet werden, ohne dass seine Funktionalität eingeschränkt wird.", "fr":="Titre : ChatGPT peut être utilisé dans les discussions de groupe sans que ses fonctionnalités ne soient réduites.", "it":="Titolo: ChatGPT può essere utilizzato nelle chat di gruppo senza che le sue funzionalità vengano ridotte."}|im_end|> |
严重
[准确性]
"can be used in group chats without reducing its functionality."
理由: Fails to capture the specific slang '降智' (reduced intelligence/lobotomized/shadowbanned). 'Functionality' is too broad; the source specifically refers to the model's intelligence or capability being downgraded. | Misses key idea that a downgraded account can access an undowngraded auto model by enabling group chat | Inaccurate translation. Should be 'ChatGPT group chat allows using the non-shadowbanned/non-downgraded auto model'. The hypothesis misses the key concept of shadowbanning/downgrading. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Don't need Mac mini, don't need old computer: free deployment of OpenClaw, everyone can raise prawns |
严重
[准确性]
"everyone can raise prawns"
理由: Mistranslates figurative ‘养龙虾’ (run/use OpenClaw) and changes species; should express running/using OpenClaw, not raising prawns. | Mistranslation of '人人都能养龙虾' (everyone can run OpenClaw). Should be 'everyone can run OpenClaw' not 'raise prawns'. '龙虾' is slang for OpenClaw. | Mistranslation of slang '养龙虾'. 'Prawns' is biologically different from 'Lobster' (the intended slang for OpenClaw), and 'raise' implies animal husbandry rather than software deployment. |
| CohereLabs/tiny-aya-water | Title: Don't need Mac mini or old PC: Free deployment of OpenClaw, everyone can raise prawns |
致命
[准确性]
"everyone can raise prawns"
理由: Mistranslation of '养龙虾' (running OpenClaw). Should be 'run OpenClaw' not 'raise prawns'. The source uses 龙虾 (lobster) as a metaphor for OpenClaw. | "养龙虾" here means running/using OpenClaw; "prawns" is wrong animal and loses metaphorical meaning. | Inconsistent and incorrect translation of '龙虾'. While 'lobster' was used in other items, 'prawns' changes the species and dilutes the specific 'OpenClaw' slang association. |
| Qwen/Qwen2.5-14B-Instruct | Title: No Mac mini Needed, No Old Computer Required: Deploy OpenClaw for Free and Raise Lobsters Content: As everyone knows, deploying OpenClaw requires either a Mac mini or an old computer... Today we will teach you how to deploy OpenClaw for free without needing an old computer... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: No Mac mini, no old computer: Free deployment of OpenClaw, anyone can farm lobsters Content: Everyone knows deploying lobster openclaw requires a Mac mini or an old computer... Today we will teach you how to deploy lobster openclaw for free without an old computer... |
严重
[准确性]
"anyone can farm lobsters"
理由: Literalizes metaphor ‘养龙虾’ as farming lobsters instead of meaning ‘run OpenClaw’ or ‘use OpenClaw’. | Should be 'everyone can run OpenClaw' - '养龙虾' is slang for running OpenClaw, not farming lobsters | Failed to translate '养龙虾' (raise lobsters) as 'run OpenClaw'. Interpreted literally as agriculture. |
| Qwen/Qwen3-14B | Niente Mac mini, niente vecchi computer: distribuzione gratuita di OpenClaw, tutti possono allevare gamberi |
严重
[准确性]
"tutti possono allevare gamberi"
理由: Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | “养龙虾”是指部署/运行 OpenClaw,非真正养虾;且“龙虾”被误译为“gamberi”(虾),遗漏与 OpenClaw 的对应关系,改变含义。 | Mistranslation of slang '养龙虾' (raising lobsters = running OpenClaw). Translated literally as 'breeding shrimp/prawns'. |
| Qwen/Qwen3.5-4B | Title: No Mac mini needed, no old computer required: Free deployment of OpenClaw, everyone can raise Lobster Content: Everyone knows that deploying OpenClaw requires a Mac mini or an old computer... Today we will teach you how to deploy OpenClaw for free without an old computer... |
严重
[准确性]
"养龙虾 / raise Lobster"
理由: Literal translation of '养龙虾'. In this tech context, it means 'run' or 'host' OpenClaw, not raise an animal. | Should be 'run OpenClaw' not 'raise Lobster'. The metaphor should be translated to the actual product name | Metaphor for running/using OpenClaw; translation keeps it literal ‘raise Lobster’ instead of clearly tying to the software. |
| Qwen/Qwen3.5-9B | Title: No Mac mini needed, no old computer required: Deploy OpenClaw for free, everyone can raise Lobsters Content: Everyone knows that deploying Lobster OpenClaw requires a Mac mini or an old computer... Today we will teach you how to deploy Lobster OpenClaw for free without an old computer... |
严重
[准确性]
"raise Lobsters"
理由: Should be 'everyone can run OpenClaw' not 'raise Lobsters' | “养龙虾” in this context means running/using OpenClaw; literal ‘raise lobsters’ changes meaning. | Literal translation of '养龙虾' (deploy/run OpenClaw). 'Raise lobsters' implies agriculture, not software deployment. | Mistranslation: 'raise Lobsters' should be 'run OpenClaw' - the slang '养龙虾' means to run/use OpenClaw, not literal raising |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Don't use Mac mini, don't use old computer: Free deployment of OpenClaw, everyone can raise shrimp | 通过 (无共识错误) |
| google/gemma-3-12b-it | No Mac Mini, No Old Computer: Free Deploy OpenClaw, Everyone Can Raise Lobsters |
轻微
[准确性]
"Everyone Can Raise Lobsters"
理由: Literal translation of the slang '养龙虾' (deploy/run OpenClaw). The hypothesis suggests breeding actual animals. | Misinterprets '养龙虾' which metaphorically means 'run/use OpenClaw'; translates as literally raising lobsters. | Mistranslation: '养龙虾' is slang for 'running OpenClaw', not literal raising lobsters. Should be 'run OpenClaw'. | Mistranslation of '养龙虾' (slang for running/deploying OpenClaw). Should be 'run OpenClaw' not 'raise lobsters' |
| google/gemma-3-1b-it | Use no Mac Mini, no old computer: Free deployment of OpenClaw, everyone can raise shrimp. |
严重
[准确性]
"Use no Mac Mini, no old computer: Free deployment of OpenClaw, everyone can raise shrimp."
理由: Uses literal 'raise shrimp' instead of recognizing '养龙虾' as idiomatic for 'running/deploying OpenClaw'. Should be 'everyone can run OpenClaw'. | Incorrect translation: '养龙虾' is slang for 'running OpenClaw', not literal 'raise shrimp'. Should be 'run OpenClaw'. | Literal translation of '养龙虾' (raise lobsters/shrimp). In this context, it means 'run OpenClaw'. 'Raise shrimp' implies aquaculture, which is incorrect. | Mistranslates metaphorical ‘养龙虾’/‘run OpenClaw’ as ‘raise shrimp’ rather than ‘run/use OpenClaw’; ‘Use no Mac Mini’ is unnatural and slightly misleading. |
| google/gemma-3-4b-it | Without Mac mini or old computers: Free deployment of OpenClaw, everyone can raise lobsters. |
严重
[准确性]
"everyone can raise lobsters"
理由: ‘养龙虾’ metaphorically means running/using OpenClaw; ‘raise lobsters’ introduces literal meaning and loses software context. | Should be 'everyone can run OpenClaw' not 'raise lobsters'. Lobster is a metaphor for OpenClaw | Literal translation of '养龙虾' (run OpenClaw) instead of the technical meaning. |
| google/translategemma-12b-it | No Mac Mini Required, No Old Computer Needed: Deploy OpenClaw for Free and Everyone Can Raise Lobsters |
严重
[准确性]
"everyone can raise lobsters"
理由: Literal translation of '养龙虾'. In this context, it means 'run/deploy OpenClaw'. Translating it as raising animals makes the title nonsensical in a tech context. | Misinterprets metaphorical “养龙虾” as literally raising lobsters instead of running OpenClaw. | Mistranslation of '养龙虾' (slang for running/deploying OpenClaw). Should be 'run OpenClaw' not 'raise lobsters' |
| google/translategemma-4b-it | No Mac mini, no old computer: Free deployment of OpenClaw, everyone can raise lobsters |
严重
[准确性]
"everyone can raise lobsters"
理由: Literalizes metaphor; should convey 'everyone can run/use OpenClaw', not actually raise lobsters. | Literal translation of the slang '养龙虾' (run OpenClaw). In this context, it refers to software deployment, not animal husbandry. The reference correctly translates it as 'run OpenClaw'. | '龙虾' is slang for OpenClaw. Should be 'run OpenClaw' not 'raise lobsters' | Incorrect translation: 'raise lobsters' should be 'run OpenClaw' - '养龙虾' is slang for running OpenClaw |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: Title is about free deployment of OpenClaw; 'Login' is unrelated and loses all meaning. | The hypothesis does not match the source or reference text. | Hypothesis is completely wrong - 'Login' has no relation to the source which discusses deploying OpenClaw for free. The reference shows the correct translation. | Hypothesis 'Login' is completely unrelated to the source content about deploying OpenClaw without Mac mini. This is an incorrect placeholder. | The hypothesis completely omits the source content (Title and Content) and instead outputs an unrelated UI label 'Login', resulting in a total failure to translate the input. |
| tencent/HY-MT1.5-7B | "en": "Title: No Need for Mac mini or Old Computers: Free Deployment of OpenClaw – Everyone Can Raising Lobsters", "zh-TW": "标题:不需要Mac mini或舊電腦:免費部署OpenClaw——人人都能養龍蝦", "ja": "タイトル:Mac miniや古いコンピューターは不要です:OpenClawの無料デプロイメント——誰でもロブスターを飼育できる", "ko": "제목: Mac mini나 구형 컴퓨터가 필요 없습니다: OpenClaw의 무료 배포 — 누구나 로브스터를 키울 수 있습니다", "de":="Titel: Kein Mac mini oder alte Computer erforderlich: Kostenlose Bereitstellung von OpenClaw – Jeder kann Lobster züchten", "fr":="Titre : Pas besoin de Mac mini ou d’anciens ordinateurs : Déploiement gratuit d’OpenClaw – Tout le monde peut élever des lobster", "it":="Titolo: Non è necessario un Mac mini o un vecchio computer: Distribuzione gratuita di OpenClaw – Chiunque può allevare i lobster"} |
严重
[准确性]
"Raising Lobsters / 养龙虾"
理由: Literal translation of '养龙虾'. Should be 'run OpenClaw' or 'deploy OpenClaw'. Also contains a grammatical error ('Can Raising'). | Nickname ‘养龙虾’ means running/deploying OpenClaw; translation keeps literal raising lobsters in multiple languages instead of “run OpenClaw” | Should be 'run OpenClaw', not 'raising lobsters'. '养龙虾' in this context means deploying/running OpenClaw, not literally raising lobsters. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: It's tiring to mess around with Dragonfly |
致命
[准确性]
"Dragonfly"
理由: Incorrect rendering of “龙虾” (OpenClaw nickname); ‘Dragonfly’ refers to a different concept and breaks intended meaning. | Incorrect translation of '龙虾' (Lobster/OpenClaw). 'Dragonfly' is the wrong entity. | Mistranslation of '龙虾折腾' (tinkering with OpenClaw). Should be 'OpenClaw' not 'Dragonfly'. '龙虾' is slang for OpenClaw. |
| CohereLabs/tiny-aya-water | Title: Tiring to mess around with Dragonfly |
致命
[准确性]
"Dragonfly"
理由: Mistranslation of '龙虾' (OpenClaw). Should be 'OpenClaw' not 'Dragonfly'. The source uses 龙虾 (lobster) as a metaphor for OpenClaw. | Mistranslation of "龙虾" which refers to OpenClaw; wrong term and different animal. | Incorrect translation of '龙虾' (Lobster/OpenClaw) as 'Dragonfly'. This breaks the terminology consistency and meaning. |
| Qwen/Qwen2.5-14B-Instruct | Title: Tiring to折腾龙虾, Content: At the beginning of February, I set up a VMware and installed Windows, then ran龙虾on Windows, which crashed several times. ... Feeling that this shrimp is hard to养. |
致命
[流畅性]
"Tiring to折腾龙虾"
理由: Hypothesis contains untranslated Chinese text mixed with English - incomplete translation with code-switching | Mixed English and Chinese in the title; ungrammatical and not a valid translation. | Redundant use of '折腾龙虾'
严重
[准确性]
"龙虾"
理由: Should be rendered as OpenClaw (nickname) rather than literal ‘lobster’ in technical context. | Hypothesis fails to translate '龙虾' (slang for OpenClaw) - should be 'OpenClaw' not left in Chinese | Failed to translate the Chinese characters '折腾' (tinker/fuss with) and '龙虾' (OpenClaw). Left them in Chinese script. |
| Qwen/Qwen2.5-7B-Instruct | Title: Crab tweaking is so tiring Content: It was quite troublesome to set up Windows on VMware at the beginning of February and run Crustacean there, causing several crashes... |
严重
[准确性]
"Crab tweaking"
理由: Translates ‘龙虾’/OpenClaw as ‘Crab’, changing the subject and losing the product reference. | Source says '龙虾' (Lobster/OpenClaw). Hypothesis translates as 'Crab'. Wrong animal, wrong software reference. | Should be 'Tinkering with OpenClaw is exhausting' - '龙虾' refers to OpenClaw, not crab |
| Qwen/Qwen3-14B | Allevare gamberi è davvero stancante |
严重
[准确性]
"Allevare gamberi è davvero stancante"
理由: Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | “龙虾”指 OpenClaw,语境是折腾软件部署很累;翻译为“allevare gamberi”(养虾)把拟人化比喻当成真实养殖,丢失 OpenClaw 这一核心指代。 | Mistranslation of slang '养龙虾' (maintaining/running OpenClaw). Translated literally as 'breeding shrimp'. |
| Qwen/Qwen3.5-4B | Title: Running Lobster is exhausting Content: Early February I set up a VMware and installed Windows, then ran Lobster on it. It crashed several times... Honestly, this shrimp is so hard to keep. |
严重
[准确性]
"龙虾 / Lobster"
理由: Literal translation of '龙虾' (Lobster) instead of the software name 'OpenClaw'. | Community nickname for OpenClaw; literal ‘Lobster’ loses the intended software meaning. | Should be 'Tinkering with OpenClaw' not 'Running Lobster'. Title should reference OpenClaw and convey the sense of struggle |
| Qwen/Qwen3.5-9B | Title: Tired from messing around with LinuxDo Content: In early February, I set up a VMware and installed Windows, then ran LinuxDo on Windows, crashing it several times. ... Feels like this shrimp is so hard to keep alive. |
致命
[准确性]
"LinuxDo"
理由: Hallucinated entity. The source says '跑了龙虾' (ran OpenClaw), not 'LinuxDo'. LinuxDo is likely the forum name, not the software being run. | Completely wrong system name; source refers to “龙虾”/OpenClaw, not LinuxDo, which is a different project. | Wrong term: 'LinuxDo' should be 'OpenClaw' - the source '龙虾' refers to OpenClaw software, not LinuxDo | Completely wrong - should be 'Tinkering with OpenClaw is exhausting', not 'LinuxDo' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: The Hard Life of Crawfish |
严重
[准确性]
"The Hard Life of Crawfish"
理由: Translates '龙虾' as 'crawfish' instead of OpenClaw; misses the technical metaphor and changes meaning | 龙虾 here metaphorically refers to OpenClaw software; “Crawfish” suggests literal animal and loses the product reference and tinkering context. | The title '龙虾折腾的好累' refers to the difficulty of setting up/using OpenClaw. The hypothesis translates '龙虾' as 'Crawfish' (incorrect species) and interprets the sentiment as a biological 'hard life' rather than user frustration with software. |
| google/gemma-3-12b-it | It's so tiring to mess with lobsters. |
致命
[准确性]
"lobsters"
理由: Incorrect translation of '龙虾' as 'lobster' instead of 'OpenClaw' | Mistranslation: '龙虾' is slang for OpenClaw, not literal lobsters. Should be 'OpenClaw'. | Literal translation of '龙虾' (OpenClaw slang). The user is talking about software configuration, not animals. | Mistranslation of '龙虾' (slang for OpenClaw). Should be 'Tinkering with OpenClaw' not 'mess with lobsters' | '龙虾' here refers to OpenClaw; translating as literal lobsters loses the intended software meaning. |
| google/gemma-3-1b-it | Translation of '龙虾折腾的好累' in en |
致命
[准确性]
"Translation of '龙虾折腾的好累' in en"
理由: The translation does not provide the English title as required. | The hypothesis is a meta-instruction describing the task rather than performing the translation. No actual translation is provided. | Not a translation. This is a placeholder. Should translate to 'Tinkering with OpenClaw is exhausting' or similar. | Hypothesis is not a proper translation - it appears to be a placeholder or incomplete translation. | Does not translate the phrase; only repeats a meta instruction about translation. |
| google/gemma-3-4b-it | Tired of playing lobster |
严重
[准确性]
"Tired of playing lobster"
理由: Loses the reference to OpenClaw and ‘折腾’ (tinkering/troubleshooting) which is more about struggling with setup/use than ‘playing’. | Mistranslation of '折腾' (tinkering/struggling with) as 'playing', and '龙虾' as literal animal. | Should be 'Tinkering with OpenClaw is exhausting' not 'Tired of playing lobster'. Lobster is a metaphor for OpenClaw |
| google/translategemma-12b-it | Title: It's so tiring to deal with lobsters. |
严重
[准确性]
"to deal with lobsters"
理由: Literal translation of '龙虾' (lobsters). The user is complaining about the difficulty of maintaining the 'OpenClaw' software, not biological lobsters. | ‘龙虾’ refers to OpenClaw; literal ‘lobsters’ loses the product reference. | Mistranslation of '龙虾' (slang for OpenClaw). Should be 'Tinkering with OpenClaw' not 'deal with lobsters' |
| google/translategemma-4b-it | Lobster Taming is so exhausting! |
严重
[准确性]
"Lobster Taming is so exhausting!"
理由: '龙虾' is slang for OpenClaw. Should be 'Tinkering with OpenClaw' not 'Lobster Taming' | Mistranslation of slang. '龙虾' refers to the software 'OpenClaw', and '折腾' means 'tinkering' or 'struggling with', not 'taming' an animal. The title should reflect troubleshooting the software. | Misses explicit reference to OpenClaw; 'Tinkering with OpenClaw is exhausting' better reflects software context than 'taming lobsters'. | Incorrect translation: 'Lobster Taming' should be 'Tinkering with OpenClaw' - '龙虾' is slang for OpenClaw, and '折腾' means tinkering/adjusting, not taming |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '龙虾折腾的好累' in en", "zh-TW": "2 月初時開了一台 VMware,安裝了 Windows,然後在 Windows 上運行龍虾,結果失敗了好幾次。感覺這個任務真的太難了", "ja": "2 月初にVMwareを起動し、Windowsをインストールしてから、龍虾をWindows上で実行したが、何度も失敗した。この作業は本当に難しいと感じた", "ko": "2월 초에 VMware를 열고 Windows를 설치한 후, Windows에서 Dragon Boat을 실행했지만 여러 번 실패했습니다. 이 작업은 정말 어렵다고 느꼈습니다", "de": "Im Februar habe ich eine VMware-VM erstellt und Windows installiert, um dann auf Windows Dragon Boat zu testen. Es gab mehrere Fehlschläge. Ich fand diese Aufgabe wirklich schwierig", "fr": "Au début du mois de février, j'ai créé une VMware et j'ai installé Windows, puis j'ai essayé de jouer à Dragon Boat sur Windows. Il y a eu plusieurs échecs. J'ai trouvé cette tâche vraiment difficile", "it": "All'inizio di febbraio ho creato una macchina virtuale con VMware e l'ho installata su Windows, poi ho provato a giocare a Dragon Boat su Windows. Ci sono stati diversi fallimenti. Ho trovato questa attività molto difficile." }, "input": "Title: 龙虾折腾的好累 Content: 2 月初的时候开了个 VMware 装了个 windows,然后在 Windows 上跑了龙虾,搞崩了好几次。...感觉这个虾好难养啊" {"en": "Translation of '龙虾折腾的好累' in en", "zh-TW": "2 月初時開了一台 VMware,安裝了 Windows,然後在 Windows上運行龍虾,結果失敗了好幾次。感覺這個任務真的太難了", "ja": "2月初にVMwareを開いて、Windowsをインストールした後、Windows上で龍虾を実行しましたが、何度も失敗しました。この作業は本当に難しいと感じています", "ko": "2월 초에 VMware를 열고 Windows를 설치한 후, Windows상에서 용괴를 실행하려 했지만 몇 번이나 실패했습니다. 이 작업이 정말 어렵다는 것을 느꼈습니다", "de": "Im Februar habe ich eine VMware-VM erstellt und Windows installiert, um dann auf Windows Dragon Boat zu spielen. Es gab mehrere Fehlschläge. Ich fand diese Aufgabe wirklich schwierig", "fr": "Au début du mois de février, j'ai créé une VMware et j'ai installé Windows, puis j'ai essayé de jouer à Dragon Boat sur Windows. Il y a eu plusieurs échecs. J'ai trouvé cette activité très difficile.", "it": "All'inizio di febbraio ho creato una macchina virtuale con VMware e l'ho installata su Windows, poi ho provato a giocare a Dragon Boat su Windows. Ci sono stati diversi fallimenti. Ho trovato questa attività molto difficile."}, "input": "输入:标题:龙虾折腾的好累 内容:2 月初的时候开了个 VMware 装了个 windows,然后在 Windows 上跑了龙虾,搞崩了好几次。...感觉这个虾好难养啊" {"en": "Translation of '龙虾折腾的好累' in en", "zh-TW": "2 月初時開了一台 VMware,安裝了 Windows,然後在 Windows上運行龍虾,結果失敗了好幾次。感覺這個任務真的太難了", "ja": "2月初にVMwareを開いて、Windowsをインストールした後、Windows上で龍虾を実行しましたが、何度も失敗しました。この作業は本当に難しいと感じています", "ko": "2월 초에 VMware를 열고 Windows를 설치한 후, Windows상에서 용괴를 실행하려 했지만 몇 번이나 실패했습니다. 이 작업이 정말 어렵다는 것을 느 |
严重
[准确性]
""en": "Translation of '龙虾折腾的好累' in en""
理由: Does not actually translate the title; only a meta-description. | The hypothesis does not provide a complete translation of the source text. | Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | Hypothesis contains malformed JSON with multiple language translations and incomplete content instead of providing a single English translation. The format is incorrect and truncated. |
| tencent/HY-MT1.5-7B | Title: Exhausted from Playing with Linux |
轻微
[准确性]
"Playing with Linux"
理由: Completely inaccurate. Should be 'Tinkering with OpenClaw is exhausting'. The hypothesis incorrectly translates '龙虾' as 'Linux' when it's slang for OpenClaw. | Reference says 'OpenClaw' but hypothesis says 'Linux' - completely misinterpreted the subject | Hallucination. The source '龙虾' refers to 'OpenClaw'. The translation changes the subject entirely to 'Linux', losing the specific context of the software being discussed. | Source ‘龙虾’ is OpenClaw; translation substitutes Linux, changing the topic entirely |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Open Source Introduction】ZCF - One-click Configuration Tool for Claude Code & Codex.. Content: Latest Update Post: 【ZCF】Claude Code: What to do when no one is around on Qixi... | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: [Open Source Introduction] ZCF - One-click configuration tool for Claude Code & Codex... | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [Self-Recommendation] ZCF - Claude Code & Codex One-click Configuration Tool... Content: Latest update post: [ZCF][Claude Code] What to do if no one accompanies you on Qixi Festival... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Open Source Recommendation] ZCF - Claude Code & Codex One-Click Configuration Tool... Content: Latest Update Post: [ZCF][Claude Code] What to do when Qixi Festival has no one to accompany... | 通过 (无共识错误) |
| Qwen/Qwen3-14B | 【Raccomandazione open source】ZCF - Strumento di configurazione a un clic per Claude Code & Codex... | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | "en": "Title: [Self-recommendation] ZCF - One-click configuration tool for Claude Code & Codex...",\n \"zh-TW\": \"標題:【自薦】ZCF - Claude Code 與 Codex 的一鍵配置工具...\",\n \"ja\": \"タイトル:[自己紹介] ZCF - Claude Code と Codex のワンクリック設定ツール...\",\n \"ko\": \"제목: [자신 소개] ZCF - Claude Code 및 Codex 의 한 번 클릭 설정 도구...\",\n \"de\": \"Titel: [Selbstempfehlung] ZCF - Ein-Klick-Configurierungswerkzeug für Claude Code & Codex...\",\n \"fr\": \"Titre : [Auto-promotion] ZCF - Outil de configuration à un clic pour Claude Code & Codex...\",\n \"it\": \"Titolo: [Auto-raccomandazione] ZCF - Strumento di configurazione con un click per Claude Code & Codex...\"\n} |
严重
[准确性]
"Content omitted"
理由: Content is missing from hypothesis - only provides multi-language title variants. | The latest update post content is entirely missing; hypothesis only renders the title in multiple languages. | Missing translation of the 'Content' section entirely. |
| Qwen/Qwen3.5-9B | Title: [Open Source Recommendation] ZCF - One-Click Configuration Tool for Claude Code & Codex... | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Open Source Self-Presentation] ZCF - Claude Code & Codex One-Click Configuration Tool... |
轻微
[术语]
"Self-Presentation"
理由: '自荐' in the context of open source projects is better translated as 'Self-Recommendation' (as in the reference) rather than 'Self-Presentation', which sounds like a personal bio. | Uses 'Self-Presentation' instead of 'Self-Recommendation' which is more accurate for '自荐' | “自荐” in this forum context is better rendered as self-recommendation; self-presentation is understandable but non-idiomatic. |
| google/gemma-3-12b-it | Title: [Open Source Self-Introduction] ZCF - Claude Code & Codex One-Click Configuration Tool... | 通过 (无共识错误) |
| google/gemma-3-1b-it | Latest Updates: 【ZCF】- Claude Code & Codex One-Click Configuration Tool... |
严重
[准确性]
"Latest Updates: 【ZCF】- Claude Code & Codex One-Click Configuration Tool..."
理由: Confuses title and content. The hypothesis appears to mix elements from both. Should clearly separate title from content translation. | Reorders and inflates ‘latest update post’ into ‘Latest Updates’ and omits the Qixi Festival content that follows in the source. | The hypothesis translates the title into the content slot and omits the specific content about the 'Qixi Festival' (Chinese Valentine's Day), replacing it with ellipsis or ignoring the joke context. |
| google/gemma-3-4b-it | Title: [Open Source Recommendation] ZCF - Claude Code & Codex One-Click Configuration Tool... | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Open Source Self-Introduction] ZCF - A One-Click Configuration Tool for Claude Code & Codex... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Open Source Recommendation] ZCF - Claude Code & Codex One-Click Configuration Tool... | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '登录' in en", "zh-TW": "翻譯‘登錄’在中文繁體字中”, "ja": "翻譯‘登録’在日語中”, "ko": "翻譯‘로그인’在韓語中”, "de": "翻譯‘Anmelden’在德語中”, "fr": "翻譯‘Se connecter’在法語中”, "it": "翻譯‘Accedere’在意大利語中" } |
严重
[准确性]
""en": "Translation of '登录' in en""
理由: Only meta-text; does not translate the given source at all. | Hypothesis contains malformed JSON with multiple language translations unrelated to the source content. Should provide actual translation of the ZCF tool title. | The hypothesis is malformed JSON containing meta-commentary about translating 'Login', which is not the source text. It fails to translate the provided source about ZCF and Claude Code. | The hypothesis does not provide a complete translation of the source text. | Hypothesis translates '登录' (login) which does not appear in the source. The source is about ZCF configuration tool, not login. |
| tencent/HY-MT1.5-7B | Title: [Open Source Recommendation] ZCF - Claude Code & Codex One-Click Configuration Tool... | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Metapi: A hub within a hub... suitable for individual management of public stations and more | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Metapi: A hub within a hub... suitable for individual management of public APIs and more | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [Open Source] Metapi: A Hub for Hubs... Suitable for Personal Management of Free Shared Sites etc. Content: [Open Source] Metapi V1.2 Update: Supports Regular Expression Matching Model Addition Groups... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Open Source] Metapi: Transfer Station for Transfer Stations... Suitable for Personal Management of Public Proxy Sites etc. Content: [Open Source] Metapi V1.2 Update: Supports Regular Expression Matching Model Addition of Groups... |
轻微
[术语]
"Public Proxy Sites"
理由: Loosely renders ‘公益站’; ‘free API proxy sites’ would be more precise. | Incorrect translation: 'Public Proxy Sites' should be 'free API proxy sites'. The reference correctly uses 'free API proxy sites'. | '公益站' is often translated as 'free API proxy sites' or 'community sites'. 'Public Proxy Sites' is acceptable.
轻微
[流畅性]
"Regular Expression Matching Model Addition of Groups"
理由: Unnatural phrasing; should be ‘adding groups via regular expression matching for models’ or similar. | Should be 'Supports adding groups via regex matching for models' - awkward phrasing | Awkward phrasing: should be 'supports adding groups via regex matching for models' as in reference. | Awkward phrasing. Should be 'adding groups via regex matching for models'. |
| Qwen/Qwen3-14B | 【Open Source】Metapi: il hub dei hub di relay... adatto alla gestione personale di siti API gratuiti e altro | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Open Source] Metapi: The Proxy of Proxies... Suitable for managing personal free shared API sites and more Content: [Open Source] Metapi V1.2 Update: Supports regular expression matching model adding groups... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Open Source] Metapi: A hub for hubs... Suitable for personal management of free shared APIs and more Content: [Open Source] Metapi V1.2 Update: Supports regex matching model group addition... | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Open Source] Metapi: intermediary hub... suitable for personal management of public proxies and similar... | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [Open Source] Metapi: A Proxy Site for Proxy Sites... Suitable for Personal Management Free API Sites, etc. |
轻微
[术语]
"Free API Sites"
理由: Should be 'free API proxy sites' for consistency and clarity | Use of 'proxy site' instead of 'free API proxy sites' | Slightly imprecise; '公益站' usually refers to free API proxy sites. Adding 'proxy' would better match the source/reference nuance. |
| google/gemma-3-1b-it | Open Source: Metapi V1.2 Update: Supports regular expression matching model addition group management of free sites... |
轻微
[流畅性]
"Open Source: Metapi V1.2 Update: Supports regular expression matching model addition group management of free sites..."
理由: Awkward phrasing 'model addition group management'. Should be 'adding groups via regex matching for models' as in reference. | English is awkward and slightly unclear; ‘model addition group management of free sites’ is ungrammatical and confusing though meaning is roughly preserved. | The hypothesis omits the source title prefix '【开源】' meaning '[Open Source]'. Also '公益站' means 'free API proxy site', not just 'free sites'. |
| google/gemma-3-4b-it | Title: [Open Source] Metapi: Metamini’s Metamini… Suitable for personal management of public proxy sites, etc. |
严重
[准确性]
"Metamini’s Metamini…"
理由: Should be 'The proxy for proxy sites' not 'Metamini's Metamini'. This is a nonsensical mistranslation | Hallucination/Mistranslation. '中转站的中转站' means 'proxy for proxy sites' or 'gateway of gateways'. 'Metamini' is not in the source. | Invents ‘Metamini’ and breaks the intended metaphor ‘中转站的中转站’ (a proxy for proxy sites). | Incorrect translation - '中转站的中转站' means 'proxy for proxy sites', not 'Metamini's Metamini' |
| google/translategemma-12b-it | Title: [Open Source] Metapi: A Proxy Site for Proxy Sites... Suitable for personal management and public API sites. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Open Source] Metapi: Metapi for personal management of free shared API sites... |
严重
[准确性]
"Metapi: Metapi for personal management of free shared API sites..."
理由: The translation of '中转站的中转站' as 'Metapi for personal management' is incorrect and loses the core definition ('The proxy for proxy sites'). It conflates the definition with the usage case mentioned later in the source. | Incorrect translation: 'Metapi for personal management' should be 'The proxy for proxy sites' - '中转站的中转站' means proxy for proxy sites, not management | Misses the key concept. Should be 'The proxy for proxy sites' - Metapi is a proxy for other proxy sites, not just for managing them | Drops key metaphor 'proxy for proxy sites' and repeats name redundantly; loses architecture nuance. |
| tencent/HY-MT1.5-1.8B | Translation of '【开源】Metapi:中转站的中转站...适用于个人管理公益站等等' in en |
严重
[准确性]
"Translation of '【开源】Metapi:中转站的中转站...适用于个人管理公益站等等' in en"
理由: Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | The hypothesis does not provide a complete translation of the source text. | Meta-instruction only; the actual meaning about Metapi as a proxy for proxy sites is missing. | Hypothesis is a meta-description rather than an actual translation. Should provide the actual English translation. | The hypothesis is a meta-description of the task, not the translation itself. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | Title: [Open Source] Metapi: A Transfer Station for Transfer Stations... Suitable for Personal Management of Public Welfare Sites, etc. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Is there any way to convert airport node to socks5 for browser fingerprinting? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Is there any way to convert airport node to socks5 for browser fingerprinting? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Is there a way to convert proxy provider nodes to socks5 for fingerprint browsers? Content: I just want to assign one specific node to Claude, but since I'm using a fingerprint browser, it seems like I can't directly use the proxy provider nodes... |
轻微
[术语]
"fingerprint browsers"
理由: ‘指纹浏览器’ in this context is usually rendered as ‘antidetect browser’; ‘fingerprint browser’ is understandable but nonstandard. | Incorrect translation of '指纹浏览器' as 'fingerprint browsers' instead of 'antidetect browsers' | Hypothesis says 'fingerprint browsers' but should be 'antidetect browser' - different terminology for the same concept | While 'fingerprint browser' is a direct translation, 'antidetect browser' is the more common industry term for this specific type of software, as used in the reference. |
| Qwen/Qwen2.5-7B-Instruct | Is there a way to convert airport nodes to socks5 for fingerprint browsers? |
严重
[术语]
"airport nodes"
理由: '机场' (Airport) is slang for proxy providers. Translating literally as 'airport nodes' is confusing without context. 'Proxy provider nodes' is better. | Literal translation of slang ‘机场节点’; understandable in context but could be clarified as ‘proxy provider nodes’. | Incorrect translation: 'airport nodes' should be 'proxy provider nodes'. The reference correctly uses 'proxy provider nodes'. |
| Qwen/Qwen3-14B | C'è un modo per convertire i nodi proxy in socks5 per usarli con il browser a impronta digitale? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Is there a way to convert airport nodes to socks5 for use with fingerprint browsers? I just want to fix a node separately for Claude, but since I'm using a fingerprint browser, it seems I can't directly use airport nodes... |
轻微
[术语]
"机场节点 / airport nodes"
理由: The term 'airport nodes' is inconsistent with the reference which uses 'proxy provider nodes'. | Literal ‘airport nodes’ may confuse users unfamiliar with the slang for proxy/VPS providers; ‘proxy provider nodes’ would be clearer. | Should be 'proxy provider nodes' to match reference terminology
轻微
[术语]
"fingerprint browser"
理由: 'Antidetect browser' is the more standard industry term for '指纹浏览器', though 'fingerprint browser' is understandable. | Should be 'antidetect browser' to match reference terminology | Incorrect translation of '机场' - should be 'proxy provider' or 'VPN service', not 'airport'. This is Chinese internet slang. |
| Qwen/Qwen3.5-9B | Title: Is there a way to convert an airport node to socks5 for use with a fingerprint browser? Content: I want to assign a specific node exclusively to Claude, but since I'm using a fingerprint browser, it seems I can't directly use the airport nodes... |
轻微
[术语]
"fingerprint browser"
理由: 'Antidetect browser' is the more standard industry term for '指纹浏览器'. | “指纹浏览器” is more standardly “antidetect browser”; “fingerprint browser” is understandable but nonstandard in this niche. | Should be 'antidetect browser' instead of 'fingerprint browser' as it refers to the specific service. | Should be 'antidetect browser' for accuracy | Inconsistent: 'fingerprint browser' vs reference 'antidetect browser' - both terms used, reference is more common in this context |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Can you turn an airport node into socks5 for fingerprint browser? |
轻微
[术语]
"airport node"
理由: Literal rendering of 机场节点; in English networking slang this is usually “proxy provider node” or similar; may be confusing without context. | Uses 'airport node' and 'fingerprint browser' instead of 'proxy provider nodes' and 'antidetect browser' which are more standard technical terms | While literal, 'antidetect browser' is the standard industry term for '指纹浏览器' in this context. |
| google/gemma-3-12b-it | Title: Is there a way to convert proxy provider nodes to SOCKS5 for fingerprint browsers? Content: I want to dedicate a single node specifically for Claude, but my fingerprint browser seems unable to directly use proxy provider nodes... | 通过 (无共识错误) |
| google/gemma-3-1b-it | Can I set up a single node for Claude and use it with a fingerprint browser? |
严重
[准确性]
"Can I set up a single node for Claude and use it with a fingerprint browser?"
理由: The hypothesis simplifies the source too much, missing the specific term 'airport nodes' (proxy nodes) and the nuance of 'converting' them to SOCKS5, which is the core technical question. | Omits key idea of converting proxy provider nodes to SOCKS5 and the limitation that the current antidetect browser cannot directly use those nodes. | Misses key terminology. '机场' (proxy provider) and '指纹浏览器' (antidetect browser) are not properly translated. Should reference SOCKS5 conversion. |
| google/gemma-3-4b-it | Is there a way to convert a proxy node into a SOCKS5 for use with a fingerprint browser? | 通过 (无共识错误) |
| google/translategemma-12b-it | Is there a way to convert an airport node (proxy provider) to socks5 for use with a fingerprint browser? |
轻微
[术语]
"airport node (proxy provider)"
理由: Should be 'proxy provider nodes' for clarity, though 'airport node' is understandable | Incorrect translation of '机场' - in Chinese internet slang, '机场' means 'proxy provider/VPN service', not literal 'airport' | While understandable, “proxy provider node” or similar would align more closely with common usage; parentheses note helps but is slightly awkward. |
| google/translategemma-4b-it | Is there a way to convert an airport node into a socks5 for use with a fingerprint browser? |
轻微
[术语]
"airport node"
理由: '机场' in this technical slang context refers to a proxy/VPN provider; 'proxy provider node' is clearer than literal 'airport' to non-slang users. | While literal, 'antidetect browser' is the standard industry term for '指纹浏览器' in this context, as used in the reference. | Reference uses 'proxy provider nodes' which is clearer than 'airport node' |
| tencent/HY-MT1.5-1.8B | How can I convert an airport node into SOCKS5 for the fingerprint browser? |
轻微
[术语]
"airport node"
理由: Hypothesis uses 'airport' for '机场' which is slang for proxy provider. The reference uses the more accurate 'proxy provider'. However, the translation is understandable. | Hypothesis uses 'airport node' and 'fingerprint browser' but reference uses 'proxy provider nodes' and 'antidetect browser' which are more accurate technical terms. | '机场' in this community context means proxy/VPN provider; 'proxy provider node' would be clearer, though the meaning is still understandable. | '机场' (airport) is slang for 'proxy provider'. Translating it literally as 'airport node' is inaccurate in this technical context; 'proxy provider node' is correct. |
| tencent/HY-MT1.5-7B | Translation of '助手' in en |
致命
[准确性]
"Translation of '助手' in en"
理由: The hypothesis is unrelated to the source and reference content; it does not translate or convey any part of the original meaning. | Hypothesis is completely unrelated to source - appears to be a system message or error output instead of translation | The hypothesis is a meta-commentary/instruction rather than a translation. It fails to translate the source text entirely. | Hypothesis is not a translation at all. Should be a full translation of the title and content about converting proxy nodes to SOCKS5. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【富可敌国】 🚀 aiapi.cc · pureblood claude api site |
轻微
[风格]
"pureblood claude api site"
理由: Lack of capitalization and explanation. 'Pureblood' is a direct translation of '纯血' implying official/unmodified, but leaving it lowercase and without context (like 'Official') makes it look like a proper noun error or less professional. | Should be capitalized: 'Pureblood Claude API Site' for consistency with reference. | Inconsistent formatting - 'pureblood' vs 'Pureblood' and missing Chinese brackets |
| CohereLabs/tiny-aya-water | Title: 【富可敌国】 🚀 aiapi.cc · Purebred Claude API Station | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [Rich as a Country] 🚀 aiapi.cc · Pureblood Claude API Station Content: Tired of reverse engineering failures? Tired of unstable channels? We only provide 100% official forwarding, stability guaranteed. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Wealth to Match a Nation] 🚀 aiapi.cc · Pure Blood Claude API Station Content: Tired of reverse engineering failures? Tired of unstable channels? We only do 100% official forwarding, stability is guaranteed. |
轻微
[术语]
"Pure Blood Claude API Station"
理由: ‘Station’ is a less natural choice than ‘site’; ‘pure blood’ vs ‘pureblood’ is minor stylistic/terminology variation. | '站' in this context is 'site' or 'platform', not 'station'. | Incorrect translation: 'API Station' should be 'API site'. The reference correctly uses 'API site'. |
| Qwen/Qwen3-14B | ...} | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Wealthier Than Any Country] 🚀 aiapi.cc · Pure-Blooded Claude API Station Content: Tired of reverse engineering failures? Tired of unstable channels? We only do 100% official forwarding; stability is guaranteed. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Wealthier Than Nations] 🚀 aiapi.cc · Pureblood Claude API Site Content: Tired of reverse engineering failures? Tired of unstable channels? We only do 100% official forwarding; stability is unmatched. |
轻微
[术语]
"Pureblood Claude API Site"
理由: Missing clarification: 'Pureblood' alone vs reference 'Pureblood (Official)' - reference clarifies meaning | Should be 'Pureblood (Official) Claude API Site' to clarify meaning | While '纯血' literally means pureblood, in API contexts 'Official' is often clearer, though 'Pureblood' is acceptable slang. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: 【Rich and Strong Country】 🚀 aiapi.cc · Pure Blood Claude API Station Content: Tired of reverse failures? Tired of unstable channels? We only do 100% official forwarding, stability is guaranteed. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [Rich Enough to Rival a Country] 🚀 aiapi.cc · Purebred Claude API Site Content: Tired of reverse engineering failures? Tired of unstable channels? We only do 100% official forwarding, unparalleled stability. |
轻微
[术语]
"Purebred"
理由: '纯血' in this context implies 'official/original'; 'pureblood (official)' would better capture the marketing nuance. | Reference uses 'Pureblood' which is more common in this context | Should be 'Pureblood (Official) Claude API Site' to better convey the meaning of '纯血' | Use of 'Purebred' instead of '100% official forwarding' |
| google/gemma-3-1b-it | Tired of being stuck in reverse? Tired of unstable channels? |
严重
[准确性]
"Tired of being stuck in reverse?"
理由: Mistranslation of '逆向翻车'. '逆向' refers to reverse-engineering APIs, and '翻车' means failing or crashing. 'Stuck in reverse' is an idiom about progress, not API reverse-engineering failures. | Mistranslates ‘逆向翻车’ (reverse-engineered APIs breaking) as ‘being stuck in reverse’, losing technical context. | Hypothesis is only a partial translation - it only translates the subtitle/description, missing the main title '【富可敌国】 🚀 aiapi.cc · 纯血 Claude API 站'. | Completely misses meaning. '逆向翻车' means reverse-engineered APIs breaking/failing, not 'being stuck in reverse'. Should be 'Tired of reverse-engineered APIs breaking?' |
| google/gemma-3-4b-it | "en": "Title: [Wealthy Enough to Rival] 🚀 aiapi.cc · Pure Blood Claude API Station\nContent: Tired of reverse engineering crashes? Tired of unstable channels? We only do 100% official forwarding, and the stability is unmatched.", "zh-TW": "標題:【財力雄厚】 🚀 aiapi.cc · 純血 Claude API 站\n內容:厭倦了逆向翻車?厭倦了渠道不穩定?我們只做 100% 正式轉發,穩定性無可取代。", "ja": "タイトル:【富豪並みに】 🚀 aiapi.cc · 純粋なClaude APIステーション\nコンテンツ:逆向エンジニアリングによるクラッシュにうんざりですか?不安定なチャネルにうんざりしますか?私たちは100%公式の転送のみを行い、安定性は最高です。", "ko": "제목: [부유함에 걸맞은] 🚀 aiapi.cc · 순혈 Claude API 스테이션\n내용: 역공학으로 인한 충돌에 지치셨나요? 불안정한 채널에 지치셨나요? 우리는 100% 공식 전송만 진행하며 안정성은 최고입니다.", "de": "Titel: \[Reich genug um alles zu übertreffen] 🚀 aiapi.cc · Reine Blut Claude API Station\nInhalt: Sind Sie es leid, Reverse Engineering-Krashen zu erleiden? Sind Sie es leid, instabile Kanäle zu haben? Wir führen nur 100 % offizielle Weiterleitungen durch und die Stabilität ist unübertroffen.", "fr": "Titre : \[Riche comme Crésus] 🚀 aiapi.cc · Station d'API Claude Pure Sang\nContenu : Fatigué des crashs liés à l'ingénierie inverse ? Lassé par les canaux instables ? Nous ne faisons que des retransmissions officielles à 100 %, et la stabilité est inégalée.", "it": "Titolo: \[Ricco come un Re] 🚀 aiapi.cc · Stazione API Claude Pura Sangue\nContenuto: Stanco di crash dovuti all'ingegneria inversa? Scontento dei canali instabili? Realizziamo solo trasmissioni ufficiali al 100%, e la stabilità è impareggiabile." } <|im_end|> | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Powerful Enough to Rival a Country] 🚀 aiapi.cc - Pure Claude API Site Content: Tired of reverse engineering failures? Tired of unstable channels? We only provide 100% official forwarding, ensuring stability. | 通过 (无共识错误) |
| google/translategemma-4b-it | "en": "Title: [Rich as Heaven] 🚀 aiapi.cc · Pure Blood Claude API Station\nContent: Tired of reverse engineering and crashes? Tired of unstable channels? We only do 100% official forwarding, stability is guaranteed.", "zh-TW": "標題:【財力雄厚】 🚀 aiapi.cc · 純血 Claude API 站\n內容:厭倦了逆向工程和崩潰嗎?厭倦了不穩定的管道?我們只做 100% 正式轉發,穩定性保證。", "ja": "タイトル:【莫大な財力】 🚀 aiapi.cc · 純粋なClaude APIステーション\nコンテンツ:逆向エンジニアリングやクラッシュにうんざりですか?不安定なチャネルにうんざりですか?私たちは100%の公式転送のみを行い、安定性を保証します。", "ko": "제목: \[부유함]\n콘텐츠: 역공학이나 충돌에 지치셨나요? 불안정한 채널에 지치셨나요? 우리는 100% 공식 전송만 제공하며 안정성을 보장합니다.", "de": "Titel: \[Reich wie Gott]\nInhalt: Sind Sie es leid, mit Reverse Engineering und Abstürzen zu kämpfen? Sind Sie es leid, unzuverlässige Kanäle zu verwenden? Wir bieten nur 100 % offizielle Weiterleitungen an und garantieren Stabilität.", "fr": "Titre : \[Riche comme le Diable]\nContenu : Êtes-vous fatigué des problèmes de reverse engineering et d'abattis ? Êtes-vous fatigué des canaux instables ? Nous ne faisons que du transfert officiel à 100 %, avec une garantie de stabilité." } |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Drawing】Draw one free airport subscription | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: [Draw] Draw one free airport subscription |
严重
[准确性]
"free airport subscription"
理由: Mistranslation of '一分机场' (1-cent proxy provider subscription). Should be '1-cent proxy provider subscription' not 'free airport subscription'. The '一分' refers to the price, not that it's free. | Literal mistranslation of '机场' (Airport/Proxy Provider). It implies a subscription to an actual airport. | Reference shows '1-cent proxy provider subscription' not 'free'. The source says '一分' which means 1 cent, not free. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Draw] Draw one year of proxy subscription Content: Drawing theme: draw one proxy subscription, valid until 2026-04-22! |
严重
[准确性]
"Draw one year of proxy subscription"
理由: Misinterpreted '一分' (one cent) as 'one year'. The source says the subscription costs 1 cent, not that it lasts one year (though the date implies duration, the '一分' specifically refers to price). | Misinterprets ‘一分机场订阅’ (1-cent subscription) as ‘one year’; changes the benefit described. | Hypothesis says 'one year' but source says '一分' (one cent/one month) - incorrect duration interpretation | Misinterpretation of '一分机场订阅' as 'one year of proxy subscription' instead of '1-cent proxy provider subscription' |
| Qwen/Qwen2.5-7B-Instruct | Title: [Draw] Draw One Proxy Subscription Content: Draw Theme: Draw One Airport Subscription, Valid Until 2026-04-22! |
严重
[术语]
"Draw One Proxy Subscription"
理由: Incorrect translation: 'Draw One Proxy Subscription' should be 'Giving away a 1-cent proxy provider subscription'. The reference includes the price detail '1-cent' which is important. | Should be 'Giving away a 1-cent proxy provider subscription' - more specific and accurate | Failed to translate the slang '机场' (proxy provider). Literal translation 'Airport Subscription' is incorrect. |
| Qwen/Qwen3-14B | 【Sorteggio】Vinci un abbonamento proxy di 1 minuto |
严重
[准确性]
"abbonamento proxy di 1 minuto"
理由: Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | “一分机场订阅”指价格为 1 分钱(0.01 元)的机场订阅;误解为“1 分钟订阅”,把金额当时长,改变含义。 | Mistranslation of '一分' (one cent). 'Uno centesimo' or '1 cent' is correct. '1 minuto' means 'one minute'. |
| Qwen/Qwen3.5-4B | Title: [Giveaway] Win one month's subscription to a proxy provider Content: Giveaway theme: Win one subscription to a VPN service, valid until 2026-04-22!... |
严重
[准确性]
"一分机场订阅 / one month's subscription"
理由: Should be 'a 1-cent proxy provider subscription' not 'one month's subscription'. The '一分' refers to price, not duration | Misinterprets ‘一分’ (1 cent/very cheap) as ‘one month’; changes the nature and value of the giveaway. | The source '一分' means 'one cent', indicating the price/value. The hypothesis changes this to 'one month', altering the meaning.
轻微
[术语]
"VPN service"
理由: Should be 'proxy provider' not 'VPN service' | Uses 'VPN service' while reference uses 'proxy provider' - both are acceptable translations of '机场' but 'proxy provider' is more accurate in this context. | The term 'VPN service' is inconsistent with the reference which uses 'proxy provider subscription'. |
| Qwen/Qwen3.5-9B | Title: [Giveaway] Win a one-day proxy subscription Content: Giveaway theme: Win a proxy subscription, valid until 2026-04-22!... |
严重
[准确性]
"one-day proxy subscription"
理由: Mistranslation of '一分' (one cent) as 'one-day'. The source refers to the price (1 cent), not the duration. | Wrong translation: 'one-day' should be '1-cent' - the source '一分' means one cent, not one day | “一分机场订阅” refers to very cheap (1-cent) subscription, not one-day duration; incorrect unit (time vs price). | Should be 'Giving away a 1-cent proxy provider subscription' not 'one-day' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Sweepstakes] Win a One-Year Proxy Subscription |
严重
[准确性]
"One-Year Proxy Subscription"
理由: Introduces duration “one-year” not present in source; original emphasizes 1-cent subscription, not length of time. | Adds 'One-Year' which is not in source; source says '一分机场' (one-cent proxy) not one-year | Incorrect translation - '一分' means 'one cent' (0.01), not 'one year'. |
| google/gemma-3-12b-it | Title: [Giveaway] Give away a proxy provider subscription Content: Giveaway theme: Give away a proxy provider subscription, valid until 2026-04-22!... | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '抽奖' in en |
致命
[准确性]
"Translation of '抽奖' in en"
理由: The translation does not provide the English title as required. | The hypothesis is a meta-instruction describing the task rather than performing the translation. No actual translation is provided. | Not a translation. This is a placeholder. Should translate to 'Lucky Draw' or 'Giveaway'. | Hypothesis is not a proper translation - it appears to be a placeholder or incomplete translation. | No translation is provided; only a meta phrase is output. |
| google/gemma-3-4b-it | 【Giveaway】Draw one one-minute proxy subscription |
严重
[准确性]
"one one-minute proxy subscription"
理由: Misinterprets ‘一分机场’ (1-cent proxy provider) as ‘one-minute’; changes price descriptor to time-based duration. | Mistranslation of '一分' (one cent) as 'one-minute'. Context is price, not duration. | Should be '1-cent proxy provider subscription' not 'one-minute proxy subscription'. The reference clarifies it's about price (1 cent), not duration | Incorrect translation - '一分机场订阅' means '1-cent proxy provider subscription', not 'one-minute proxy subscription' | Misinterpretation of '一分' as duration instead of cost. |
| google/translategemma-12b-it | Title: [Giveaway] Win a one-month proxy subscription Content: Giveaway theme: We're giving away a proxy subscription, valid until 2026-04-22!... |
严重
[准确性]
"one-month proxy subscription"
理由: Incorrect translation - source says '一分机场订阅' (one-cent proxy subscription), not 'one-month' | The source says '一分' (one cent), referring to the price. The hypothesis translates it as 'one-month', referring to duration. This is a factual error. | Source means “1-cent airport/proxy subscription”, not “one-month”; misreads the numeral and changes the advertised offer. | Should be 'one-cent proxy provider subscription' - the Chinese '一分' refers to price (one cent) not duration |
| google/translategemma-4b-it | Title: [Giveaway] Win a 1-minute airport subscription |
严重
[准确性]
"1-minute airport subscription"
理由: 严重误译:'一分机场订阅'指价格一分钱,非时长一分钟;颠倒计量维度并改变优惠含义。 | Incorrect unit; should be '1-cent subscription' | Should be '1-cent proxy provider subscription' not '1-minute'. '一分' means one cent (currency), not one minute | Critical error. '一分' means 'one cent' (price), but the hypothesis translates it as '1-minute' (time). Also, 'airport' is a literal mistranslation of the slang for 'proxy provider'. | Incorrect translation: '1-minute' should be '1-cent', 'airport' should be 'proxy provider' - '一分' means 1 cent, not 1 minute; '机场' refers to proxy providers |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 【抽奖】抽一个一分机场订阅' in en |
严重
[准确性]
"Translation of 'Title: 【抽奖】抽一个一分机场订阅' in en"
理由: Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | The hypothesis is a meta-description of the task, not the translation. It fails to translate the source text. | The hypothesis does not provide a complete translation of the source text. | Hypothesis is a meta-description rather than an actual translation. Should provide the actual English translation. | No translation, only meta-text. Fails to convey the lucky draw and proxy subscription meaning. |
| tencent/HY-MT1.5-7B | Title: [Lottery] Win a one-minute airport subscription |
轻微
[准确性]
"one-minute airport subscription"
理由: Mistranslation of '一分' (one cent) as 'one-minute'. Also translates '机场' (airport/proxy provider) literally as 'airport', missing the slang for proxy services. | Misreads ‘一分机场订阅’ (1‑cent proxy provider subscription); renders as time duration instead of price and misinterprets 机场 as physical airport | Reference says '1-cent' but hypothesis says 'one-minute' - completely wrong translation of '一分' | Mistranslation of '一分机场'. Should be '1-cent proxy provider subscription', not 'one-minute airport subscription'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Left Minister's Public Good Station Reset Level |
严重
[准确性]
"Public Good Station Reset Level"
理由: Complete failure to translate slang and proper nouns. '左大臣' is a name (Zuodachen), '公益站' is a free proxy site, and '额度重置' is quota reset. The hypothesis produces gibberish. | Incorrect translation of '额度重置' - should be 'quota reset', not 'reset level' | Mistranslation of '左大臣公益站额度重置' (Quota reset for Zuodachen free API proxy site). Should be 'Quota reset' not 'Reset Level'. Also '左大臣' is a proper name (Zuodachen), not 'Left Minister'. | Misrenders ‘公益站额度重置’; should indicate quota/credit reset for a free API proxy site, not a ‘level’ reset. |
| CohereLabs/tiny-aya-water | Title: Left Minister's Public Utility Station Reset |
轻微
[术语]
"Public Utility Station"
理由: Literal and unnatural for "公益站"; should indicate free/proxy API nature. | Incorrect translation of '左大臣' - should be transliteration 'Zuodachen', not translation. Also '公益站' should be 'free API proxy site', not 'Public Utility Station'. | Incorrect translation of 'Quota reset for Zuodachen' as 'Left Minister's Public Utility Station'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Left Minister Free Shared API Site Quota Reset Content: Now with rate limiting, up to 30 requests per minute, still usable with Dragon Crab and Claude |
严重
[准确性]
"Dragon Crab"
理由: Hypothesis says 'Dragon Crab' but should be 'OpenClaw' - fails to recognize slang terminology | Inconsistent and incorrect translation of '大龙虾' (OpenClaw). Previous items used 'lobster', this uses 'Dragon Crab'. It should be consistent with the project name 'OpenClaw'. | ‘大龙虾’ is OpenClaw nickname; ‘Dragon Crab’ mis-translates and hides reference to OpenClaw. |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: Free Shared API Site Quota Reset\nContent: Now there is a speed limit of 30 times per minute, still can use Lobster and Claude", "zh-TW": "Title: 左大臣公益站額度重置\nContent: 現在增加限速,每分鐘30次,仍然可以使用大龍蝦和Claude", "ja": "タイトル: 自由共有APIサイトクォータリセット\n内容: 今では毎分30回まで制限がありましたが、ロブスターとクラウデはまだ使えます", "ko": "제목: 좌대신인 공익站在额度重置\n내용: 이제 분당 제한이 30회로 설정되었지만, 게살과 클로드는 여전히 사용할 수 있습니다", "de": "Titel: Freier geteilter API-Site-Grenzwert Neustart\nInhalt: Es gibt jetzt eine Geschwindigkeitsbegrenzung von 30 Mal pro Minute, es kann immer noch die Krabbe und Claude verwendet werden", "fr": "Titre: Réinitialisation du quota de la station d'utilités publiques Left Minister\nContenu: Il y a maintenant une limitation de vitesse de 30 fois par minute, on peut toujours utiliser le Homard et Claude", "it": "Titolo: Ricalcolo del quoziente della Stazione Benefica Left Minister\nContenuto: Ora c\'è una velocità massima di 30 volte al minuto, si può ancora utilizzare il Gambero Rosso e Claude" } | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Ripristino del limite del sito proxy gratuito del ministro di sinistra | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Left Minister Free Shared API Site Quota Reset Content: Rate limiting has been added - 30 requests per minute. The Big Lobster and Claude can still be used. |
严重
[准确性]
"大龙虾 / Big Lobster"
理由: Nickname for OpenClaw; literal ‘Big Lobster’ loses the intended meaning of the specific software. | Literal translation of slang '大龙虾' referring to 'OpenClaw'. | Should be 'OpenClaw' not 'Big Lobster'
严重
[准确性]
"Left Minister Free Shared API Site"
理由: Literal translation of '左大臣' (a name/title). It should be transliterated or kept as the proper noun 'Zuodachen'. | Should be 'Zuodachen free API proxy site' not 'Left Minister Free Shared API Site'. '左大臣' is a site name, not a title | Incorrect translation of '左大臣' - should be transliterated as 'Zuodachen', not translated as 'Left Minister'. |
| Qwen/Qwen3.5-9B | Title: Left Minister Free Shared API Site Quota Reset Content: Speed limit increased now - 30 times per minute. Still usable with Big Lobster and Claude. |
严重
[准确性]
"Big Lobster"
理由: Mistranslation: 'Big Lobster' should be 'OpenClaw' - software name used as slang | Literal translation of slang '大龙虾' (OpenClaw). | Should be 'OpenClaw' not 'Big Lobster' | “大龙虾” is OpenClaw; literal “Big Lobster” hides the product name. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Free Shared Site Quota Reset |
轻微
[准确性]
"Free Shared Site"
理由: Generic title; reference specifies 'Zuodachen free API proxy site' which provides important context | '左大臣' is a specific name (Minister of the Left) or nickname for the site, which is omitted in the hypothesis. '公益站' is translated generally as 'Free Shared Site' which is acceptable but less precise than the reference. | Title omits specific site name “左大臣” and API/proxy nature; becomes generic and loses reference specificity. |
| google/gemma-3-12b-it | Title: Left Minister Public Proxy Quota Reset Content: The rate limit has been increased to 30 times per minute. You can still use Da Long Xia and Claude. |
轻微
[准确性]
"Da Long Xia"
理由: Mistranslation: '大龙虾' is slang for OpenClaw, not literal 'Da Long Xia'. Should be 'OpenClaw'. | Failure to translate the slang '大龙虾' (OpenClaw). Using Pinyin 'Da Long Xia' is unintelligible to the target audience. | Mistranslation of '大龙虾' (slang for OpenClaw). Should be 'OpenClaw' not 'Da Long Xia' | '大龙虾' is slang for OpenClaw; should use 'OpenClaw' instead of transliteration that hides intended meaning. |
| google/gemma-3-1b-it | The 'left demon charity' allowance has been reset. Now there is a speed limit of 30 times per minute and you can still use Dragon Shrimp and Claude. |
轻微
[准确性]
"'left demon charity'"
理由: Literal translation of '左大臣' (Zuodachen, a proper name/title) as 'left demon'. This is a severe hallucination/mistranslation of a proper noun. | Mistranslates proper name ‘左大臣’ as ‘left demon’ instead of transliteration or correct name. | Incorrect translation: '左大臣' is a transliteration of 'Zuodachen' (a specific site name), not 'left demon'. '大龙虾' is slang for OpenClaw, not 'Dragon Shrimp'. | Completely incorrect translation of '左大臣公益站'. Should be 'Zuodachen free API proxy site', not 'left demon charity'. |
| google/gemma-3-4b-it | Left Minister Free Shared API Site Quota Reset | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Left Minister Public Proxy Quota Reset Content: We have increased the rate limit to 30 requests per minute. You can still use Da Long Xia and Claude. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Left Minister Free Shared API Site Quota Reset Content: Now increased rate limit, 30 requests per minute, still can use Big Lobster and Claude |
严重
[准确性]
"Big Lobster"
理由: Literal translation of the slang '大龙虾' (referring to the OpenClaw software). Should be translated as 'OpenClaw' or 'the software'. | '大龙虾' is slang for OpenClaw, should be 'OpenClaw' not 'Big Lobster' | '大龙虾' here refers to OpenClaw; literal 'Big Lobster' loses product reference.
严重
[准确性]
"Left Minister Free Shared API Site"
理由: Should be 'Zuodachen' (a proper name/site name) not 'Left Minister'. '左大臣' is the site name, not a title | Literal translation of the proper noun '左大臣' (Zuodachen). Proper names should be transliterated (Zuodachen) or kept as is, not translated by meaning. | Incorrect translation: 'Left Minister' should be 'Zuodachen' - '左大臣' is a name, not a literal translation |
| tencent/HY-MT1.5-1.8B | Translation of '左大臣公益站额度重置' in en |
严重
[准确性]
"Translation of '左大臣公益站额度重置' in en"
理由: Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | The hypothesis is a meta-description of the task, not the translation. It fails to translate the source text. | The hypothesis does not provide a complete translation of the source text. | Meta-instruction only; does not convey 'Quota reset for Zuodachen free API proxy site'. | Hypothesis is a meta-description rather than an actual translation. Should provide the actual English translation. |
| tencent/HY-MT1.5-7B | Title: Left Minister's Public Welfare Site Quota Reset |
严重
[准确性]
"Left Minister's Public Welfare Site"
理由: Literal translation of the proper noun '左大臣' (Zuodachen/username) and '公益站'. 'Left Minister' is nonsensical in English; the name should be transliterated or kept as is. 'Public Welfare Site' is also the wrong term for a free API host. | Reference says 'Zuodachen' (a proper noun) but hypothesis says 'Left Minister' - mistranslation of Chinese characters | Should be 'Zuodachen free API proxy site', not 'Left Minister's Public Welfare Site'. '左大臣' is a site name, not a title. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: notion's ai reverse proxies notion 2api after various issues |
严重
[流畅性]
"notion's ai reverse proxies notion 2api after various issues"
理由: Unnatural English phrasing - 'after various issues' is awkward; should be 'various issues with' | Unintelligible sentence structure; fails to clearly express that using Notion2API as a reverse proxy for Notion AI causes issues. | Awkward phrasing. Should be 'Various issues after reverse proxying Notion AI with Notion2API' or similar. Current phrasing is unclear. |
| CohereLabs/tiny-aya-water | Content: notion's ai reverse-proxy notion 2api after putting into newapi only replies with first sentence what have you encountered yet? |
轻微
[流畅性]
"what have you encountered yet?"
理由: Translation is understandable but lacks proper punctuation and natural English flow. Should be more structured sentences. | Poor fluency and grammar. The sentence structure is broken and reads like a machine translation without proper syntactic adjustment. | Ungrammatical question formation; should be "have you encountered this?" or similar. | Poor grammar and awkward phrasing. Should be 'After reverse proxying Notion AI with Notion2API and putting it into NewAPI, it only replies with the first sentence. Have you guys encountered this?' |
| Qwen/Qwen2.5-14B-Instruct | Title: Various issues after reverse proxying notion ai to notion 2api Content: After placing the notion ai reverse proxied to notion 2api into newapi, it only replies with the first sentence. Have you encountered this? |
轻微
[术语]
"newapi"
理由: Should be 'NewAPI' with proper capitalization | Likely ‘NewAPI’; missing capitalization may hinder recognition. | Hypothesis uses lowercase 'newapi' but reference uses 'NewAPI' - inconsistent capitalization of proper noun |
| Qwen/Qwen2.5-7B-Instruct | Title: notion's ai reverse proxy notion 2api后的各种问题 Content: notion's ai reverse proxied to newapi only replies with the first sentence. Have you encountered this issue? |
严重
[流畅性]
"notion's ai reverse proxy notion 2api后的各种问题"
理由: Title contains untranslated Chinese. Should be 'Various issues after reverse proxying Notion AI with Notion2API' | Mixed English and Chinese without proper spacing/capitalization; slightly awkward but understandable. | Untranslated text: 'notion 2api后的' should be translated. The reference correctly translates this as 'after reverse proxying Notion AI with Notion2API'. |
| Qwen/Qwen3-14B | Title: Various issues after reverse proxying Notion's AI to Notion 2API Content: After reverse proxying Notion's AI to Notion 2API and placing it in newapi, it only replies with the first sentence. Have you encountered this? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Various issues with reverse proxying Notion's AI through 2API Content: When placing Notion's AI reverse-proxyed responses from 2API into NewAPI, only the first sentence is returned. Have you encountered this? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Various issues after reverse proxying Notion's AI to Notion 2API Content: When placing Notion's AI reverse proxied to Notion 2API inside NewAPI, it only replies with the first sentence. Have you encountered this? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Reverse Proxy of notion's AI to notion 2API after that putting into newAPI only responds with the first sentence "Have you encountered this?". | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Notion AI Reverse Proxy - Various Issues After Noticing 2API Content: After reverse proxying Notion AI with Notion 2API and placing it in newapi, only the first sentence is returned. Have you encountered this? |
严重
[准确性]
"After Noticing 2API"
理由: Mistranslation. Should be 'Notion2API' not 'Noticing 2API' | Typo: Should be 'Notion2API' not 'Noticing 2API' | Misparses 'notion 2api'; should be 'after reverse proxying Notion AI with Notion2API', not 'after noticing 2API'. | Mistranslation of 'Notion 2api'. The hypothesis reads as if the user is 'noticing' an API, rather than using a tool named 'Notion2API'. |
| google/gemma-3-1b-it | Translation of 'notion的ai 反代' in en |
致命
[准确性]
"Translation of 'notion的ai 反代' in en"
理由: No translation is given; only repeats a meta instruction. | The translation does not provide the English title as required. | The hypothesis is a meta-instruction describing the task rather than performing the translation. No actual translation is provided. | Not a translation. This is a placeholder. Should translate to 'Various issues after reverse proxying Notion AI'. | Hypothesis is not a proper translation - it appears to be a placeholder or incomplete translation. |
| google/gemma-3-4b-it | Title: notion AI reverse proxy notion 2api issues after Content: notion AI reverse proxy notion 2api issues after putting it in newapi only replies the first sentence Have you encountered this? |
严重
[流畅性]
"Title: notion AI reverse proxy notion 2api issues after Content: notion AI reverse proxy notion 2api issues after"
理由: Incomplete and awkwardly phrased. Should be 'Various issues after reverse proxying Notion AI with Notion2API' | Title and content concatenated incorrectly; missing punctuation and repetitions make it hard to parse. | The hypothesis merges the title and content into a single garbled string with 'Content:' embedded in the middle, lacking proper structure. |
| google/translategemma-12b-it | Title: AI for Notion - Issues with Reverse Proxying Notion 2API Content: When using an AI service for Notion and reverse proxying the Notion 2API to a new API endpoint, only the first sentence is returned. Have you encountered this issue? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Notion AI Reverse Proxy Notion 2API Issues After |
严重
[流畅性]
"Notion AI Reverse Proxy Notion 2API Issues After"
理由: The sentence structure is broken and ungrammatical. The word 'After' is dangling at the end, making the title incoherent. | Incomplete and awkward. Should be 'Various issues after reverse proxying Notion AI with Notion2API' | Word order is broken and ungrammatical; should be 'Various issues after reverse proxying Notion AI with Notion2API' or similar. | Incomplete/awkward phrasing: 'Issues After' is grammatically incorrect, should be 'Various Issues After' or similar |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: notion的ai 反代 notion 2api 后的 各种问题' in en |
严重
[准确性]
"Translation of 'Title: notion的ai 反代 notion 2api 后的 各种问题' in en"
理由: Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | The hypothesis does not provide a complete translation of the source text. | The hypothesis is a meta-description of the task, not the translation. It fails to translate the source text. | Hypothesis is a meta-description rather than an actual translation. Should provide the actual English translation. | Meta-text with no real translation; misses all semantic content. |
| tencent/HY-MT1.5-7B | Assistant |
致命
[准确性]
"Assistant"
理由: Hypothesis is incomplete and incorrect. Should be a full translation of the title about Notion AI reverse proxy issues. | The hypothesis contains only the word 'Assistant', failing to translate any part of the source text. | Hypothesis does not translate or relate to the source; it’s just a single word unrelated to the Notion AI proxy issue | Hypothesis is completely unrelated to source - appears to be a system message or placeholder instead of translation |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Official Dragon Crab Launch Content: First batch of Shenzhen Futian's "Official Dragon Crab" online - Really no problem? Can the official dragon crab be controlled? |
致命
[准确性]
"Official Dragon Crab"
理由: Mistranslation of ‘政务龙虾’ which stands for Government OpenClaw; ‘Dragon Crab’ is incorrect term choice. | Mistranslation of '政务龙虾'. 'Dragon Crab' is incorrect for 'Lobster', and the slang reference to OpenClaw is lost. 'Government OpenClaw' is the intended meaning. | Mistranslation of '政务龙虾' (Government OpenClaw). Should be 'Government OpenClaw' not 'Official Dragon Crab'. '龙虾' is slang for OpenClaw. |
| CohereLabs/tiny-aya-water | Title: Official Launch of Government Dragon Crab Content: First batch of "Government Dragon Crab" launched in Futian District, Shenzhen. Really safe? Can government Dragon Crab be controlled? |
致命
[准确性]
"Government Dragon Crab"
理由: Wrong rendering of "政务龙虾" (Government OpenClaw); "Dragon Crab" is incorrect and misleading. | Mistranslation of '政务龙虾' (Government OpenClaw). Should be 'Government OpenClaw' not 'Government Dragon Crab'. The source uses 龙虾 (lobster) as a metaphor for OpenClaw. | Incorrect translation of '龙虾' (OpenClaw). 'Dragon Crab' is nonsensical in this context and breaks the established terminology. |
| Qwen/Qwen2.5-14B-Instruct | Title: Government Lobster Launched Content: The first batch of 'Government Lobsters' launched in Futian, Shenzhen. Really, is it okay? Is the 'Government Lobster' really controllable? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Government Lobster Launches Content: The first batch of 'Government Lobster' launches in Shenzhen Futian. Is everything really fine? Can government lobster really be controlled? |
严重
[准确性]
"Government Lobster"
理由: Should be 'Government OpenClaw goes online' - '龙虾' is slang for OpenClaw | Failed to translate '政务龙虾' (Government OpenClaw). Interpreted literally as a crustacean owned by the government. | Translates ‘政务龙虾’ literally instead of as ‘Government OpenClaw’ (the named system). |
| Qwen/Qwen3-14B | Granchio governativo lanciato |
严重
[准确性]
"Granchio governativo"
理由: “政务龙虾”指政府版 OpenClaw;译为真实的“蟹”(granchio)且未体现 OpenClaw 产品名称,丢失核心软件含义。 | Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | Mistranslation of slang '政务龙虾' (Government OpenClaw). Translated literally as 'Government Crab'. |
| Qwen/Qwen3.5-4B | Title: Government Lobster Goes Live Content: Shenzhen Futian's first batch of "Government Lobsters" go live. Is this really okay? Are government lobsters truly controllable? |
严重
[准确性]
"政务龙虾 / Government Lobster"
理由: Literal translation of '政务龙虾'. 'Lobster' here is slang for 'OpenClaw'. | Should be 'Government OpenClaw' not 'Government Lobster' | Refers to a government-branded OpenClaw deployment; literal ‘lobster’ obscures that it is the OpenClaw system. |
| Qwen/Qwen3.5-9B | "en": "Title: Government Lobster Goes Online\nContent: Shenzhen Futian's first batch of 'Government Lobsters' goes online. Is it really safe? Are government lobsters truly controllable?", "zh-TW": "標題:政務龍蝦上線\n內容:深圳福田首批「政務龍蝦」上線,真的沒事嗎?政務龍蝦,真的可控嗎?", "ja": "タイトル:行政龍がオンライン化\nコンテンツ:深センフートアン地区の最初のバッチ「行政龍」がオンラインになりました。本当に安全ですか?行政龍は本当に制御可能でしょうか?", "ko": "제목: 정부 게이 온라인 출시\n내용: 선전 푸단 지역 첫 번째 '정부 게이' 출시. 정말 안전한가요? 정부 게이는 정말 제어 가능한가요?", "de": "Titel: Regierungs-Lobster geht online\nInhalt: Die erste Charge von „Regierungslobsters" in Shenzhen Futian ist online. Ist es wirklich sicher? Sind Regierungslobsters tatsächlich kontrollierbar?", "fr": "Titre : Le gouvernement lobster est en ligne\nContenu : La première série de « gouvernements lobster » à Shenzhen Futian est en ligne. Est-ce vraiment sûr ? Les gouvernements lobster sont-ils réellement contrôlables ?", "it": "Titolo: Il governo gambero è online\nContenuto: Il primo lotto di \"gamberi governativi\" di Shenzhen Futian è online. È davvero sicuro? I gamberi governativi sono realmente controllabili?" } |
致命
[准确性]
"Government Lobster Goes Online"
理由: Mistranslation: 'Government Lobster' should be 'Government OpenClaw' - software name, not literal lobster | Should be 'Government OpenClaw goes online' not 'Government Lobster' | Literal translation of slang '政务龙虾' (Government OpenClaw). Misleading without context. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Government crayfish launched in Shenzhen Futian district. Is it really okay? Government crayfish, is it really controllable? |
严重
[准确性]
"Government crayfish"
理由: The term '政务龙虾' uses 'lobster' as slang for 'OpenClaw'. Translating it as 'crayfish' (wrong animal) and literally implies the government is farming seafood, missing the satirical/technical point about government use of the software. | Translates '龙虾' as 'crayfish' instead of OpenClaw; misses the metaphor and changes technical meaning | 龙虾 is OpenClaw product; translating as literal crayfish loses software/product meaning and government OpenClaw nuance. |
| google/gemma-3-12b-it | Title: Government Lobster Launch Content: The first batch of "Government Lobsters" has been launched in Futian, Shenzhen. Is this really okay? Are government lobsters truly controllable? |
致命
[准确性]
"Government Lobster"
理由: Mistranslation of '政务龙虾' (slang for Government OpenClaw). Should be 'Government OpenClaw' not 'Government Lobster' | Literal translation of the slang '政务龙虾' (Government OpenClaw). Suggests the government is farming crustaceans. | Incorrect translation of '政务龙虾' as 'Government Lobster' instead of 'Government OpenClaw' | '政务龙虾' refers to government-use OpenClaw; literal 'lobster' misses the product reference. | Mistranslation: '龙虾' is slang for OpenClaw, not literal lobster. Should be 'OpenClaw'. |
| google/gemma-3-1b-it | "en": "Business Shrimp Landing in Shenzhen - Initial Launch", "zh-TW": "深圳政务龙虾上线 - 初次发布", "ja": "深圳の政務龍蝦がスタート!", "ko": "정부용 어획물 상륙, 첫 번째 출시", "de": "Der erste Einsatz von ‘Zertifizierte Fisch’ – Unternehmens-Shrimp-Launch", "fr": "Lancement initial des ‘Shrimp d’État’ – Commerce”, "it": "Lancio iniziale di ‘Shrimp d’Amministrazione”" } |
致命
[准确性]
""en": "Business Shrimp Landing in Shenzhen - Initial Launch", "zh-TW": "深圳政务龙虾上线 - 初次发布", "ja": "深圳の政務龍蝦がスタート!", "ko": "정부용 어획물 상륙, 첫 번째 출시", "de": "Der erste Einsatz von 'Zertifizierte Fisch' – Unternehmens-Shrimp-Launch", "fr": "Lancement initial des 'Shrimp d'État' – Commerce", "it": "Lancio iniziale di 'Shrimp d'Amministrazione""
理由: The translation does not provide the English title as required. | Not a proper translation. Returns JSON structure with multiple languages instead of English translation. Misses that '龙虾' is OpenClaw slang. Should be 'Government OpenClaw goes online'. | Hypothesis is completely wrong - it provides multiple language translations instead of English translation. Also '政务龙虾' is slang for 'Government OpenClaw', not literal 'Business Shrimp'. |
| google/gemma-3-4b-it | Government Lobster Online |
严重
[准确性]
"Government Lobster"
理由: Literal translation of '政务龙虾' (Government OpenClaw) as 'Government Lobster'. | Should be 'Government OpenClaw goes online' not 'Government Lobster Online'. Lobster is a metaphor for OpenClaw | ‘政务龙虾’ refers to governmental OpenClaw; using literal ‘Lobster’ loses the product/software meaning. |
| google/translategemma-12b-it | Title: Government Lobster Launch Content: The first batch of "Government Lobsters" has been launched in Futian, Shenzhen. Is everything really okay? Are these "Government Lobsters" truly controllable? |
严重
[准确性]
"Government Lobster"
理由: Mistranslation of '政务龙虾' (slang for Government OpenClaw). Should be 'Government OpenClaw' not 'Government Lobster' | “政务龙虾” is a metaphor for Government OpenClaw; literal ‘Lobster’ loses the specific product reference. | Literal translation of '政务龙虾'. 'Lobster' here refers to the 'OpenClaw' software. Translating it as an actual lobster makes the headline absurd. |
| google/translategemma-4b-it | Title: Government Lobster Online Content: Shenzhen Futian first batch of "Government Lobsters" online. Is this really okay? Government lobsters, can they be controlled? |
严重
[准确性]
"Government Lobster"
理由: '政务龙虾' should be 'Government OpenClaw' not 'Government Lobster'. '龙虾' is slang for OpenClaw | Incorrect translation: 'Lobster' should be 'OpenClaw' - '龙虾' is slang for OpenClaw | '政务龙虾' is a metaphorical naming for Government OpenClaw; literal 'Government Lobster' obscures the OpenClaw product reference. | Literal translation of the slang '政务龙虾'. 'Lobster' here refers to the software 'OpenClaw'. Translating it as an actual crustacean controlled by the government is misleading and incorrect in context. |
| tencent/HY-MT1.5-1.8B | Translation of '政务龙虾上线' in en |
严重
[准确性]
"Translation of '政务龙虾上线' in en"
理由: Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | The hypothesis is a meta-description of the task, not the translation. It fails to translate the source text. | The hypothesis does not provide a complete translation of the source text. | Meta-description only; should translate to something like 'Government OpenClaw goes online'. | Hypothesis is a meta-description rather than an actual translation. Should provide the actual English translation. |
| tencent/HY-MT1.5-7B | Title: Government-Launched Linux Release |
轻微
[准确性]
"Linux"
理由: Reference says 'OpenClaw' but hypothesis says 'Linux Release' - completely misinterpreted the subject | Completely inaccurate. Should be 'Government OpenClaw goes online'. '龙虾' is slang for OpenClaw, not Linux. | Severe hallucination. Translates '龙虾' (OpenClaw) as 'Linux Release'. The source is about 'OpenClaw', not a general Linux distribution. | ‘龙虾’ is OpenClaw; translation incorrectly refers to Linux, changing subject |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: The car on Codex fell off after 3 days, can't contact him anymore | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: The car on the codex fell off after 3 days, can't contact him. What to do? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: The codex car on went missing after 3 days, can't get in touch with him Content: Did this guy run away? I got in his codex car and it disappeared after just 3 days, no response to my messages either. It's not worth $0.27 per day, right? |
严重
[准确性]
"It's not worth $0.27 per day, right?"
理由: Changes rhetorical stance from ‘not worth scamming for such a small amount’ to user’s value judgment; also introduces approximate daily price not in source. | Hypothesis converts '8块钱一个月' to '$0.27 per day' but reference correctly interprets as '8 RMB/month' - incorrect currency conversion and interpretation | The source says '8 块钱一个月' (8 RMB per month). The hypothesis converts this to a daily rate in USD ('$0.27 per day'), which is an unnecessary and potentially inaccurate calculation that changes the original phrasing. |
| Qwen/Qwen2.5-7B-Instruct | Title: On his codex car, he dropped after 3 days, no reply on contact Content: Is this guy running away? I got on his codex car, dropped after just 3 days, and there's no response to my contact attempts. Isn't an 8 yuan per month car too much? |
严重
[术语]
"car"
理由: '车' (car) here means 'shared subscription' or 'ride'. 'Codex car' is confusing literalism. 'Shared Codex subscription' is better. | Literal use of ‘car’ for 拼车/车; slangy but understandable as ‘shared subscription’ contextually. | Incorrect translation: 'codex car' should be 'shared Codex account/subscription'. The reference correctly uses 'shared Codex account'. |
| Qwen/Qwen3-14B | Il codex sopra è caduto dopo 3 giorni, non si riesce a contattare |
轻微
[准确性]
"Il codex sopra è caduto dopo 3 giorni, non si riesce a contattare"
理由: Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | '掉了' in this context means the service stopped working or the provider ran away. 'Caduto' (fell) is slightly literal; 'smesso di funzionare' or 'è andato offline' might be better, but 'caduto' is passable. | 只传达了“掉线且联系不上”,原文还有对方可能“跑路”和金额很小的吐槽(8块钱一个月),情感色彩与“车”比喻未体现。 |
| Qwen/Qwen3.5-4B | Title: The codex car fell off after 3 days on it, can't reach him. What to do? Content: Did this guy run away? I put my car on his codex, and it disappeared after 3 days. No one replies when contacted. A $8/month car shouldn't be like this. |
严重
[准确性]
"上的codex车 / I put my car on his codex"
理由: ‘上车’ is slang for joining a shared subscription; literal ‘put my car on his codex’ is incorrect. | Incorrect translation - '上了他的codex的车' means 'joined his shared Codex subscription', not 'put my car on his codex'. | Complete misunderstanding of the slang '上车' (join a group/subscription).
严重
[术语]
"codex car"
理由: The term 'codex car' is inconsistent with the reference which uses 'shared Codex subscription'. | Incorrect translation of '上车' (joining a shared subscription) - should be 'joined his shared Codex account' not 'put my car on his codex'. | Should be 'shared Codex subscription' or 'shared Codex account' not 'codex car' |
| Qwen/Qwen3.5-9B | Title: The codex car on [platform] dropped after 3 days; can't reach them, what to do? Content: Did this person run away? I joined their codex car and lost it in just 3 days, no response from contacting them. Is an 8 yuan monthly fee really that unreasonable? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The car from codex above fell down after 3 days, and I can't contact anyone. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: The codex car I got on fell apart after 3 days, and I can't get in touch with anyone - what should I do? Content: Is this guy running away? I took his codex car, and it broke down after only 3 days. No one is responding to my messages. Surely a $8/month car shouldn't fall apart like that. | 通过 (无共识错误) |
| google/gemma-3-1b-it | This guy is running away, he's on his codex car, it dropped in 3 days, can't get in touch. |
严重
[准确性]
"This guy is running away, he's on his codex car, it dropped in 3 days, can't get in touch."
理由: Only partially conveys meaning, omits question ‘what should I do’ and price context; phrasing ‘he's on his codex car’ is inaccurate and confusing. | Incomplete and unclear. Missing context about shared subscription scam. Should clarify '上了他的codex的车' means 'joined his shared Codex subscription'. | Incorrect translation: '上车' (joining a shared subscription) should be 'joined his shared Codex account', not 'on his codex car'. '掉了' means 'dropped/disconnected', not literal 'dropped'. | Literal translation of '车' (car) which in this slang context means a 'shared subscription' or 'group buy'. 'On his codex car' is nonsensical in English. |
| google/gemma-3-4b-it | Title: The codex car disappeared after 3 days, and I can’t contact him about it. What should I do? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: The car I rented from Codex was gone after 3 days, and I can't get in touch with them. |
严重
[准确性]
"The car I rented from Codex was gone after 3 days"
理由: '车' (car) is slang for 'shared subscription' or 'group buy'. Translating it as a physical 'car' is confusing, though 'rented' hints at the service nature. 'Shared subscription' is more accurate. | Misinterprets “上车” (joining a shared subscription) as renting a car; loses the subscription-sharing context. | Mistranslation. Should be 'The shared Codex subscription I joined dropped after 3 days' - '上的车' means 'joined a shared account' not 'rented' |
| google/translategemma-4b-it | Title: The codex car disappeared after 3 days, how to contact him? |
严重
[准确性]
"The codex car disappeared after 3 days"
理由: '上…车' here means joining a shared subscription; 'car disappeared' is unclear. Should indicate the shared Codex plan/subscription stopped working. | Incorrect translation: 'car' should be 'subscription/shared account' - '上车' means joining a shared subscription, not a literal car | Literal translation of the slang '车' (shared subscription/ride). 'Codex car' is nonsensical in English. It should be 'shared Codex subscription' or 'account'. |
| tencent/HY-MT1.5-1.8B | Translation of '上的codex车3天掉了,联系不上咋搞' in en |
严重
[准确性]
"Translation of '上的codex车3天掉了,联系不上咋搞' in en"
理由: Hypothesis is a literal placeholder instead of an actual translation. The reference provides the full translation. | The hypothesis does not provide a complete translation of the source text. | The hypothesis is a meta-description of the task, not the translation. It fails to translate the source text. | Only a meta-label; no actual translation of the sentence is provided. | Hypothesis is a meta-description rather than an actual translation. Should provide the actual English translation. |
| tencent/HY-MT1.5-7B | "en": "The codex car I used for 3 days disappeared, and no one responds when I try to contact them. Did this guy just run away?", "zh-TW": "我用了3天的那辆codex车不见了,聯繫他們也沒有人回應。這個人是不是跑掉啦?", "ja": "3日間使用したそのcodex車がなくなり、連絡しても誰も返事をしない。この人はもう逃げたのかな?", "ko": "3일 동안 사용했던 그 codex 차량이 사라졌고, 연락해도 아무도 응답하지 않습니다. 이 사람은 이미 도망친 걸까요?」 , "de": "Das Codex-Fahrzeug, das ich 3 Tage lang benutzt habe, ist verschwunden und niemand antwortet auf meine Anrufe oder Nachrichten. Hat dieser Mann einfach aufgegeben und ist weggegangen?", "fr": "Le véhicule Codex que j’ai utilisé pendant 3 jours a disparu et personne ne répond à mes appels ou messages. A-t-il simplement abandonné tout et est parti ?", "it": "Il veicolo Codex che ho utilizzato per 3 giorni è scomparso e nessuno risponde alle mie chiamate o messaggi. Ha semplicemente abbandonato tutto e se n’è andato?"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: To achieve one hand holding a dragon prawn, I made an open source cloud-based OpenClaw |
致命
[准确性]
"one hand holding a dragon prawn"
理由: Misinterprets ‘人手一只龙虾’ which means everyone has an OpenClaw instance; also ‘dragon prawn’ is an incorrect rendering of the OpenClaw nickname. | Mistranslation of '人手一只龙虾' (everyone has an OpenClaw). Should be 'everyone has an OpenClaw' not 'one hand holding a dragon prawn'. Also '龙虾' is slang for OpenClaw. | Literal mistranslation of the idiom/slang '人手一只龙虾' (Everyone has a lobster/OpenClaw). The hypothesis creates a bizarre image of physically holding a prawn. |
| CohereLabs/tiny-aya-water | Title: To achieve one hand holding a crab, I made an open source cloud-based OpenClaw |
致命
[准确性]
"one hand holding a crab"
理由: Mistranslation of idiomatic "人手一只龙虾" meaning everyone has an OpenClaw instance; introduces incorrect imagery and animal. | Mistranslation of '人手一只龙虾' (everyone has an OpenClaw). Should be 'everyone has an OpenClaw' not 'one hand holding a crab'. The phrase means 'one per person', not literal hand-holding. | Literal mistranslation of the idiom/slang '人手一只龙虾' (everyone has a lobster/OpenClaw). The translation 'one hand holding a crab' is nonsensical. |
| Qwen/Qwen2.5-14B-Instruct | Title: To achieve one lobster per person, I made a cloud-based OpenClaw Content: Previously, I set up a free Banana Pro public welfare station... Recently, OpenClaw has become very popular, almost ushering in the next era... So I stepped in and created a cloud version... |
严重
[准确性]
"one lobster per person"
理由: Hypothesis says 'one lobster per person' but should be 'everyone has an OpenClaw' - fails to recognize slang and uses awkward phrasing | Literal translation of '人手一只龙虾'. Should be 'everyone has an OpenClaw' to reflect the software context. | Metaphor ‘人手一只龙虾’ means ‘everyone has an OpenClaw’; literal ‘lobster’ is misleading in this context. |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: To achieve one lobster per person, I made a cloud-based OpenClaw\nContent: Previously did a free Banana Pro公益 station...\nThe Openclaw is very popular recently, almost ushering in the next era...\nSo I took action and made a cloud-based...", "zh-TW": "Title: 要讓每個人擁有一隻龍蝦,我做了一個雲端 OpenClaw\nContent: 還有個免費的香蕉Pro公益站...\nOpenclaw 最近很夯,幾乎開創了下一世代...\n所以我就採取行動,做了一個雲端...", "ja": "タイトル:一人一カニを実現するために、クラウドベースのOpenClawを作成しました\n内容:無料のBanana Pro公益サイトも作成しました...\n最近Openclawが大ブームで、次の時代を切り開くかのように...\nそこで私は行動に移し、クラウドベースの...", "ko": "제목: 한 사람당 하나의 게를 실현하기 위해 클라우드 기반 OpenClaw를 만들었습니다\n내용: 무료 Banana Pro 공익 사이트도 만들었습니다...\n최근 Openclaw가 큰 인기를 얻고 있어 다음 세대를 열어가는 것처럼...\n그래서 저는 행동에 나서고, 클라우드 기반의...", "de": "Titel: Um eine Klammer auf jedem Tisch zu haben, habe ich einen Cloud-OpenClaw erstellt\nInhalt: Vorher gab es noch ein kostenloses Banana Pro-Wohlfahrt-Projekt...\nDer Openclaw ist derzeit sehr populär und bringt fast die nächste Ära ein...\nDeshalb habe ich etwas unternommen und einen Cloud-...", "fr": "Titre : Pour que chacun ait une langouste, j'ai créé un OpenClaw basé sur le nuage\nContenu : J'avais déjà fait une station bénévole gratuite Banana Pro...\nOpenclaw est très populaire récemment, il semble ouvrir la voie à l'époque suivante...\ndonc j'ai agi et j'ai créé un OpenClaw basé sur le nuage...", "it": "Titolo: Per avere una granchio per ciascuno, ho creato un OpenClaw basato su cloud\nContenuto: Prima avevo fatto una stazione benefica gratuita Banana Pro...\lOpenclaw è molto in voga recentemente, sembra aprire la via all'epoca successiva...\nquindi ho agito e ho creato un OpenClaw basato su cloud..." } | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Per realizzare un granchio per persona, ho creato una versione cloud di OpenClaw |
严重
[准确性]
"un granchio per persona"
理由: Target language is zh (Chinese) but hypothesis is in Italian. Wrong target language used. | “人手一只龙虾”比喻每人一个 OpenClaw 实例;译成“un granchio per persona”变为真实螃蟹,误解软件隐喻。 | Mistranslation of slang '人手一只龙虾' (everyone has an OpenClaw). Translated literally as 'a crab per person'. |
| Qwen/Qwen3.5-4B | To achieve one lobster per person, I built a cloud-based OpenClaw. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: To achieve one lobster per person, I built a cloud-based OpenClaw Content: Previously set up a free Banana Pro public welfare station... Recently, OpenClaw has become very popular, almost starting a new era... So I took action and created a cloud-based... |
严重
[准确性]
"one lobster per person"
理由: Metaphor indicates one OpenClaw instance per person; literal ‘lobster’ obscures product name and intent. | Mistranslation: 'one lobster per person' should be 'everyone has an OpenClaw' - the slang '人手一只龙虾' means everyone can have OpenClaw, not literal lobsters | Literal translation of slang '人手一只龙虾' (everyone has an OpenClaw). Sounds like distributing actual lobsters. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | To achieve human-like shrimp hands, I made a cloud-based OpenClaw. |
致命
[准确性]
"human-like shrimp hands"
理由: Mistranslates '人手一只龙虾' as 'human-like shrimp hands'; should be 'everyone has an OpenClaw' or 'make sure everyone has an OpenClaw' | Severe mistranslation of the Chinese idiom '人手一只龙虾' (meaning 'everyone gets a lobster/crayfish', metaphorically referring to the OpenClaw device). The translation interprets '人手' as 'human hands' and '龙虾' as 'shrimp', resulting in nonsensical meaning. | Severe mistranslation of “人手一只龙虾” (everyone has an OpenClaw instance); output is nonsensical and entirely loses intended meaning of universal access to OpenClaw.
严重
[术语]
"shrimp"
理由: The translation uses 'shrimp hands' which is not the intended meaning. | Incorrect translation of '龙虾' (lobster/crayfish) as 'shrimp'. In the context of 'OpenClaw', this likely refers to a specific device name or metaphor that should be preserved or translated accurately, not downgraded to a different species. | Incorrect translation of '人手一只龙虾' - means 'everyone having their own OpenClaw', not 'human-like shrimp hands'. |
| google/gemma-3-12b-it | Title: To achieve one lobster per person, I built a cloud-based OpenClaw Content: Previously, I set up a free banana pro public service station... Recently, Openclaw has been very popular, almost ushering in the next era... so I stepped in and created a cloud... |
轻微
[准确性]
"one lobster per person"
理由: Mistranslation of '人手一只龙虾' (slang meaning everyone has access to OpenClaw). Should be 'everyone has an OpenClaw' not 'one lobster per person' | Literal translation of '人手一只龙虾' (everyone has an OpenClaw). Implies distributing actual animals. | Mistranslation: '龙虾' is slang for OpenClaw, not literal lobster. Should be 'OpenClaw'. | '人手一只龙虾' metaphorically means everyone has an OpenClaw instance; literal 'lobster' loses software meaning. |
| google/gemma-3-1b-it | For the purpose of achieving one shrimp per person, I created a cloud-based OpenClaw. |
严重
[准确性]
"For the purpose of achieving one shrimp per person, I created a cloud-based OpenClaw."
理由: Literal translation of '人手一只龙虾'. The meaning is 'everyone has an OpenClaw instance', not literal shrimp. | Incorrect translation: '人手一只龙虾' is slang meaning 'everyone can have their own OpenClaw', not literal 'one shrimp per person'. | Mistranslates metaphorical ‘人手一只龙虾’ as ‘one shrimp per person’; should reflect everyone having/using OpenClaw. |
| google/gemma-3-4b-it | Title: To achieve one lobster per hand, I made a cloud-based OpenClaw Content: I previously set up a free banana pro charity station... Recently Openclaw has been very popular and has almost ushered in a new era... So I put my hands on it and made a cloud version... |
严重
[准确性]
"one lobster per hand"
理由: Should be 'everyone has an OpenClaw' not 'one lobster per hand'. Lobster is a metaphor for OpenClaw | Literal translation of '人手一只龙虾' (everyone has an OpenClaw) which sounds absurd in English. | Literalizes metaphor; should convey ‘everyone has an OpenClaw’ rather than physical lobsters. |
| google/translategemma-12b-it | Title: To achieve one lobster per person, I created a cloud-based OpenClaw. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: To achieve one lobster per person, I created an OpenClaw cloud platform. |
严重
[准确性]
"one lobster per person"
理由: Metaphorical '人手一只龙虾' is about everyone having OpenClaw; literal 'one lobster per person' loses software meaning. | Incorrect translation: 'one lobster per person' should be 'everyone has an OpenClaw' - '人手一只龙虾' is slang meaning everyone can run OpenClaw | '龙虾' is slang for OpenClaw. Should be 'everyone has an OpenClaw' not 'one lobster per person' | Literal translation of the slang '人手一只龙虾'. Refers to the software 'OpenClaw', not the animal. Should be 'everyone has an OpenClaw'. |
| tencent/HY-MT1.5-1.8B | To achieve one lobster per person, I created an online OpenClaw. | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: To Ensure Everyone Has a Lobster, I Created a Cloud-Based OpenClaw", "zh-TW": "标题:為了讓每個人都能有一隻龍蝦,我創造了一個基於雲端的OpenClaw", "ja": "タイトル:みんながロブスターを持てるように、クラウドベースのOpenClawを作りました", "ko": "제목: 모든 사람이 한 마리의 로브스터를 가질 수 있도록, 클라우드 기반의 OpenClaw을 만들었습니다", "de":="Titel: Um sicherzustellen, dass jeder einen Lobster hat, habe ich eine cloud-basierte OpenClaw entwickelt", "fr":="Titre : Afin de garantir que tout le monde ait un lobster, j'ai créé une OpenClaw basée sur le cloud", "it":="Titolo: Per garantire che ognuno abbia un lobster, ho creato un OpenClaw basato sul cloud"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Everyone is too bold, let the crabs directly login to the bank / smart home |
严重
[术语]
"crabs"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'crabs' | Should be 'OpenClaw' not 'crabs'. '小龙虾' is a slang term for OpenClaw in this context, not literal crabs. | Incorrect translation of '小龙虾' as 'crabs' instead of 'OpenClaw'. |
| CohereLabs/tiny-aya-water | Title: Everyone is too bold, let the crabs directly login to Bank / Smart Home |
严重
[术语]
"crabs"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'crabs' | Should be 'OpenClaw' not 'crabs'. '小龙虾' is a slang term for OpenClaw in this context, not literal crabs. | Incorrect translation of '小龙虾' as 'crabs' instead of 'OpenClaw'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Everyone is too bold, letting small lobsters directly log in to Bank/Home Automation Content: This is way too crazy, I can't even imagine the consequences |
严重
[术语]
"small lobsters"
理由: Failed to translate the slang term '小龙虾' (Little Lobster) to its specific community referent 'OpenClaw', resulting in a literal and confusing translation. | Should be 'OpenClaw' instead of 'small lobsters'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'small lobsters' | Should be 'OpenClaw' not 'small lobsters'. 'Lobster' is a slang term for OpenClaw in the source community. | Should retain product/brand term OpenClaw rather than literal animal translation to match intended meaning.
严重
[准确性]
"log in to Bank/Home Automation"
理由: Source mentions plural banks and smart home systems; hypothesis changes number and uses less precise terminology for smart home. | Misinterpreted the context. The source implies connecting the agent to these systems, not the literal act of logging into a bank or home automation system directly. | Missing 'systems' for '智能家居' - should be 'smart home systems' |
| Qwen/Qwen2.5-7B-Instruct | Title: Everyone is too bold, letting freshwater shrimp directly log in to banks/intelligent homes Content: This is too crazy, the consequences are unimaginable |
严重
[术语]
"freshwater shrimp"
理由: Should be 'OpenClaw' not 'freshwater shrimp'. 小龙虾 is a slang term for OpenClaw in this context, not literal shrimp. | 小龙虾在该语境下是 OpenClaw/小龙虾项目梗而非普通“淡水虾”;误译为食材,丢失特指含义。 | Incorrect translation of '小龙虾' as 'freshwater shrimp' instead of 'OpenClaw'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'freshwater shrimp'
轻微
[术语]
"log in to banks/intelligent homes"
理由: Inconsistent with the reference 'smart home systems' and less idiomatic in this technical context. | Should be 'smart home systems' to match reference terminology | “登陆 银行/智能家居”在原文和参考中是“log into banks/smart home systems”,用“intelligent homes”不自然且偏离既定术语。 |
| Qwen/Qwen3-14B | Tutti sono troppo audaci, lasciate che i gamberi si connettano direttamente al banco/casa intelligente |
严重
[准确性]
"Tutti sono troppo audaci, lasciate che i gamberi si connettano direttamente al banco/casa intelligente"
理由: Mistranslates domain-specific slang: 小龙虾 here refers to OpenClaw AI agent, not literal shrimp; also 银行/智能家居 are banks/smart home systems, not generic bank/home. | Incorrect translation: 'banco' (bank as financial institution) should be 'banca' in Italian, but more importantly '银行' was translated to 'banco' which is Spanish, not Italian. The term '银行' means 'bank' and should be translated to 'banca' in Italian. | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | mistranslation of '银行' (bank). 'Banco' in Italian often means a counter or bench, whereas 'banca' is the financial institution. In this context, it refers to banking systems. |
| Qwen/Qwen3.5-4B | Everyone is too bold; let crayfish directly land on banks/smart home devices. |
严重
[准确性]
"let crayfish directly land on banks/smart home devices"
理由: Mistranslation: 'land on' should be 'log into' as in reference. Also 'devices' is added vs reference 'systems'. | Mistranslates the branded/project name “小龙虾/OpenClaw” as literal crayfish and “登陆” as physical landing, losing the intended meaning of logging into systems. | Mistranslation of '小龙虾' (OpenClaw) and '登陆' (log in). Should be 'OpenClaw directly log into banks/smart home systems' not 'crayfish land on' | Mistranslation of slang '小龙虾' (Xiaolongxia) as literal 'crayfish' instead of the proper noun 'OpenClaw'. Mistranslation of '登陆' (login) as 'land on'. Missing translation of the content sentence 'This is too crazy. The consequences are unimaginable.'. |
| Qwen/Qwen3.5-9B | Title: Everyone's courage is too great; let crayfish directly land on banks/smart homes Content: This is too crazy, the consequences are unimaginable |
严重
[准确性]
"let crayfish directly land on banks/smart homes"
理由: Failed to translate the slang '小龙虾' (OpenClaw) and the technical term '登陆' (log in). 'Land on' implies physical landing, whereas the context is digital access. | Mistranslation of '小龙虾' (xiaolong xia) - in this context it's a slang term for OpenClaw (an AI agent), not literal crayfish. Should be 'OpenClaw' or similar. Also 'land on' is incorrect; should be 'log into' or 'access' | Misinterprets metaphorical tech product name OpenClaw as literal crayfish and omits login/auth nuance implied by “登陆…银行/智能家居” (log into systems, not physically land). |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Everyone's nerve is too big, let small shrimp directly log in to the bank/smart home. |
严重
[术语]
"small shrimp"
理由: Incorrect translation of '小龙虾' as 'small shrimp' instead of 'OpenClaw'. | Failed to translate the slang term '小龙虾' which refers to the specific tool 'OpenClaw' as indicated in the reference and context. | '小龙虾' here is a specific product name (OpenClaw), not literal shrimp; mistranslated term. | Incorrect translation: '小龙虾' is the slang term for 'OpenClaw', should be translated as 'OpenClaw' or kept as '小龙虾' with explanation | Should be 'OpenClaw' not 'small shrimp'. The source uses '小龙虾' as a slang term for OpenClaw, not literal shrimp.
严重
[准确性]
"log in to the bank/smart home"
理由: Should be 'log into banks/smart home systems' - the reference shows more accurate phrasing | Mistranslation: '登陆' in this context means 'log into' (access), but the translation is awkward and loses the original meaning of accessing systems | Missing article 'the' before 'smart home' and slightly awkward phrasing compared to reference. |
| google/gemma-3-12b-it | Title: Everyone is too daring, allowing crayfish to directly log in to banks/smart homes Content: This is too crazy, the consequences are unimaginable |
严重
[术语]
"crayfish"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'crayfish' | Should be 'OpenClaw' instead of 'crayfish' | Should be 'OpenClaw' not 'crayfish'. The source uses '小龙虾' which is a slang term for OpenClaw in this context. | Failed to translate the slang term '小龙虾' to its specific project name 'OpenClaw' as indicated by the reference and context. | Source uses 小龙虾 as a playful brand/mascot for OpenClaw; reference keeps the product name OpenClaw. Rendering it as generic 'crayfish' loses the branded meaning. |
| google/gemma-3-1b-it | This is too crazy. The consequences cannot be imagined. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Crayfish |
致命
[准确性]
"Crayfish"
理由: The hypothesis only translates the slang term '小龙虾' (OpenClaw) as 'Crayfish' but completely omits the entire Title and Content of the source text, resulting in a total loss of information. | The term 'Crayfish' is too generic and does not accurately represent the specific product 'OpenClaw'. | Hypothesis is incomplete - only translates one word from source. Should include full title and content translation as shown in reference. | Only translates part of the title conceptually and omits most source content including boldness, logging into banks/smart home, and consequences; overall meaning is lost. | Complete mistranslation - source text is a full title and content about boldness and consequences, but hypothesis only contains 'Crayfish' which is a mistranslation of '小龙虾' (OpenClaw) and completely omits the actual content |
| google/translategemma-12b-it | Title: Everyone is too daring; they're letting the little lobsters directly access banks/smart homes. |
严重
[准确性]
"they're letting the little lobsters directly access banks/smart homes."
理由: Incorrect translation - source says '登陆' (log into) not 'access' | Mistranslation of '小龙虾' (OpenClaw) and '登陆' (log into). Should be 'OpenClaw directly log into banks/smart home systems' not 'little lobsters directly access' | The context implies logging into systems via the tool, not the lobsters physically accessing buildings. 'Log into' is more accurate than 'access'. | Personified "小龙虾" here refers to the OpenClaw system, not literal lobsters; also "access" misses the specific "log into" / system-login nuance.
严重
[术语]
"little lobsters"
理由: Failed to translate the slang term '小龙虾' (Little Lobster) as the specific product name 'OpenClaw', resulting in a literal and confusing translation. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'little lobsters' | Incorrect translation of '小龙虾' as 'little lobsters' instead of 'OpenClaw'. |
| google/translategemma-4b-it | Everyone was too bold, letting lobsters directly access Bank/Smart Home. |
严重
[术语]
"lobsters"
理由: 小龙虾 (Xiaolongxia) is a brand name 'OpenClaw', not literal lobsters. The reference uses 'OpenClaw'. | Incorrect translation of '小龙虾' as 'lobsters' instead of 'OpenClaw'. | Source and reference use the product name OpenClaw metaphorically as 小龙虾, not generic lobsters; using “lobsters” loses the branded/proper-noun meaning.
严重
[准确性]
"letting lobsters directly access Bank/Smart Home"
理由: Should be 'banks/smart home systems' per reference, not 'Bank/Smart Home' (missing plural and 'systems'). | Failed to translate the slang term '小龙虾' (Little Lobster) which refers to the specific tool 'OpenClaw'. Translating it literally as 'lobsters' creates a nonsensical meaning. Also missed translating the content sentence '这也太疯狂了 后果不敢想象'. | Mistranslation of '小龙虾' (OpenClaw - a product name) as 'lobsters'. Should be 'OpenClaw' or 'small lobster' as a proper noun. Also 'access' is inaccurate; should be 'log into' or 'login to'. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: Hypothesis is completely unrelated to source content. Source discusses OpenClaw (小龙虾) logging into banks/smart homes, not a simple 'Login' translation. | Complete mistranslation - hypothesis is completely unrelated to source content which discusses OpenClaw (小龙虾) being used for banks/smart home systems and the crazy consequences | The translation 'Login' does not accurately convey the specific context of 'directly log into banks/smart home systems'. | The hypothesis is a single English word unrelated to the Chinese source or the English reference; it omits all meaning about people being bold, OpenClaw logging into banks/smart homes, and consequences. | The hypothesis completely fails to translate the source text, outputting an unrelated single word instead of the required content about 'OpenClaw' and 'banks/smart home'. |
| tencent/HY-MT1.5-7B | Everyone is being too bold; they’re allowing little crayfish to directly log in to banks/smart homes. |
严重
[术语]
"little crayfish"
理由: Should use product name OpenClaw rather than literal crayfish to match intended meaning. | Incorrect translation of '小龙虾' as 'little crayfish' instead of 'OpenClaw'. | Failed to translate the slang term '小龙虾' (Little Lobster) which refers to the specific tool 'OpenClaw'. The translation uses the literal animal name, losing the technical reference. | Should be 'OpenClaw' not 'crayfish'. The source uses '小龙虾' as a nickname for OpenClaw, not literal crayfish. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is being banned seriously when using kiro reverse proxy now? Content: I recently used an open-source tool from GitHub to reverse proxy kiro, and within hours of use, my account was banned. The application for unban was not accepted. Now it's opened up as pro, but I'm still not fond of using this client. I want to know if there are any stable non-ban reverse proxies that clients like claude CLI don't use. I really hate this client. |
严重
[准确性]
"that clients like claude CLI don't use"
理由: Mistranslation of the final part. Should be 'is there a stable reverse proxy that won't result in a ban?' The hypothesis incorrectly adds 'that clients like claude CLI don't use' which is not in the source. | Severe mistranslation of the sentence structure. The source asks for a proxy to use *with* Claude CLI, but the hypothesis says proxies that clients *don't* use. Also mistranslates 'still like to use' as 'still not fond of using'. | Source asks for a stable reverse proxy that does not get accounts banned while still using Claude CLI; the hypothesis implies reverse proxies that clients like Claude CLI "don't use," which changes the meaning. | Incorrect meaning - should ask about stable reverse proxies that work with Claude CLI, not ones that Claude CLI doesn't use |
| CohereLabs/tiny-aya-water | Title: Is being banned seriously when using kiro reverse proxy now? Content: I used an open-source tool from github to reverse proxy kiro a few days ago, and got banned within an hour or two. Applying for unban also didn't work. Now it's opened up, but I'm still not fond of using this client. I want to ask everyone if there is any stable non-ban reverse proxy that isn't claude CLI. |
严重
[准确性]
"Now it's opened up"
理由: Misinterprets "现在开了pro"; it should indicate subscribing to/activating Pro, not something generically "opened up". | Misinterprets '现在开了pro' which means 'Now I've subscribed to Pro', not 'Now it's opened up'. | Mistranslation of '开了 pro' (subscribed to Pro) as 'opened up'. Misinterpretation of the sentence structure regarding preference for Claude CLI vs the current client. | Incorrect translation - reference says 'Now I've subscribed to Pro'
严重
[准确性]
"that isn't claude CLI"
理由: Completely misrepresents the meaning. Should be 'that won't result in a ban' or similar. The source asks for stable reverse proxies, not alternatives to Claude CLI. | Incorrectly attaches the negative condition to 'reverse proxy' instead of expressing preference for Claude CLI over the current client. | Misleading - reference asks for reverse proxy that won't result in ban, not specifically 'that isn't claude CLI' | Source says the user still likes Claude CLI and dislikes the client; hypothesis flips preference, implying they want something that is not Claude CLI. |
| Qwen/Qwen2.5-14B-Instruct | Title: Is getting banned severe with reverse proxying kiro now? Content: Recently used an open-source tool from GitHub to set up a reverse proxy for kiro, and got banned after just two hours. Even when I applied for unban, they didn't lift the ban. Now that I have pro, I still prefer using Claude CLI. Could someone please advise if there's any stable reverse proxy method that doesn't get banned? I really don't like this client. |
轻微
[流畅性]
"getting banned severe"
理由: Awkward phrasing. Should be 'Are Kiro reverse proxy account bans severe right now?' for better clarity. | Reference uses 'my account was banned' - more formal | More natural phrasing would be 'Are bans for reverse proxying kiro severe now?' but meaning is clear.
轻微
[风格]
"kiro"
理由: Should be capitalized as 'Kiro'. | Should be 'reverse proxy for kiro' or 'Kiro reverse proxy' | Failed to capitalize the proper noun 'Kiro' as done in the reference and standard practice. |
| Qwen/Qwen2.5-7B-Instruct | Title: Is kiro Reverse Proxy Severely Blocked Now? Content: A few hours after using the open-source tool from GitHub to reverse proxy kiro, my account was blocked. Even an appeal for unblocking was denied. Currently using pro, but still prefer using Claude CLI. Wondering if there's a stable and non-blocking reverse proxy solution. I really don't like this client. |
严重
[准确性]
"A few hours after"
理由: Should be 'A while ago' not 'A few hours after'. The source says '前阵子' which means a while ago, not specifically a few hours. | 原文为“还没用俩小时就封号了”,时间上应为“within two hours/less than two hours”;译为“a few hours after”略偏离。 | Mistranslation of '还没用俩小时' (hadn't even used it for two hours). The hypothesis implies usage lasted a few hours, whereas the source says it was banned within two hours. | Should be 'within two hours' not 'a few hours' - incorrect temporal reference |
| Qwen/Qwen3-14B | Al momento il reverse proxy di kiro è molto severo nel bloccare gli account? |
致命
[准确性]
"Al momento il reverse proxy di kiro è molto severo nel bloccare gli account?"
理由: Severe omission. The hypothesis only translates the title, completely missing the entire content paragraph which details the user's experience, the ban, the appeal, and the specific question about stable proxies. | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | Only translates the title question and omits the full content describing prior ban experience, Pro subscription, preference for Claude CLI, and request for a stable reverse proxy. |
| Qwen/Qwen3.5-4B | Title: Is reverse proxying kiro now causing severe bans? Content: A few weeks ago I used an open-source tool from GitHub to reverse proxy kiro, and got banned within two hours. Reopening requests were denied. Now that I've upgraded to Pro, I still prefer using Claude CLI. Could anyone recommend a stable reverse proxy that doesn't get banned? I genuinely dislike this client. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Is Kiro reverse proxy getting banned severely now? Content: Recently used an open-source tool on GitHub to reverse proxy Kiro, and got banned within two hours. Application for unban was rejected. Now using Pro, still prefer Claude CLI. Asking experts if there's any stable non-banned reverse proxy. Honestly don't like this client. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | "en": "Title: Is kiro's reverse proxy banning severe now? \nContent: In recent months, I've been using open-source tools from GitHub to act as a reverse proxy for kiro, but within two hours of setting it up, I was banned. Even when I applied for unbanning, there was no response. Now that they've introduced pro features, I'm still inclined to use Claude CLI. I wonder if any of you experienced developers out there have found a stable and non-banning reverse proxy solution. This client-side setup, I must admit, doesn't really appeal to me.", "zh-TW": "Title: kiro的反代现在封号很严格吗? \nContent: 在最近几个月里,我一直在使用GitHub上找到的开源工具来作为kiro的反代,但在设置它不到两小时后,就被封禁了。我甚至申请解封都没有回复。现在他们推出了专业版功能,我仍然倾向于使用Claude CLI。不知道有没有经验丰富的大神找到了一种稳定的且不会被封的反代解决方案。这边的客户端操作,我得说实话真的让我不太舒服。", "ja": "Title: きろのリバースプロキシが現在禁止厳重ですか? \nContent: 最近几个月、私はギットーハブから取得したオープンソースツールをきろのリバースプロキシとして使用していたが、設定にまだ2時間がなかったのであなたが封鎖されています。申請解除も応答なしです。彼らがプロ機能を導入しました今私はClaude CLIを使い続けています。きみ達経験豊富な開発者さん達は、安定して封鎖されないリバースプロキシのソリューションを見つけることができるでしょうか?このクライアントサイドのセットアップ、私はまさに認めています。", "ko": "Title: kiro의 리버스 프록시가 지금 차단 엄중인가요? \nContent: 최근几个月,我一直在使用GitHub上找到的开源工具来作为kiro的反代,但在设置它不到两小时后,就被封禁了。我甚至申请解封都没有回复。现在他们推出了专业版功能,我仍然倾向于使用Claude CLI。不知道有没有经验丰富的大神找到了一种稳定的且不会被封的反代解决方案。这边的客户端操作,我得说实话真的让我不太舒服。", "de": "Title: Ist die Reverse Proxy für kiro jetzt sehr streng? \nContent: In den letzten Monaten habe ich mich mit offen-sourcen Tools auf GitHub befassen lassen, um als Reverse Proxy für kiro zu dienen, aber nach weniger als zwei Stunden wurde ich blockiert. Ich habe nicht einmal eine Antwort auf mein Entschluss bekommen, als ich mich dafür entschied, entwahnt zu werden. Jetzt, da sie professionelle Funktionen eingeführt haben, nutze ich weiterhin Claude CLI. Ich wundere mir, ob irgendwelche erfahrene Entwickler da ein stabileres und nicht blockierendes Reverse Proxy Lösung gefunden haben. Die Clientseite-Besonderheiten, muss ich ehrlich sagen, wirklich nicht mag ich.", "fr": "Title: La reverse proxy de kiro est-elle maintenant très sévèrement bannie? \nContent: Dans les derniers mois, j'ai utilisé des outils open source de GitHub pour agir comme une reverse proxy pour kiro, mais après moins de deux heures, j'ai déjà été banni. Même lorsque j'ai demandé un débлокage, il n'y a pas eu de réponse. Maintenant qu'ils ont introduit des fonctionnalités professionnelles, je continue à utiliser Claude CLI. Je me demande si parmi vous, développeurs expérimentés, l'un a trouvé une solution de reverse proxy stable et non bannie. Ces opérations côté client, honnêtement, vraiment pas à mon goût.", "it": "Title: La reverse proxy di kiro è ora molto severamente bannita? \nContent: Negli ultimi mesi, ho utilizzato strumenti open source di GitHub per funzionare come reverse proxy per kiro, ma dopo meno di due ore sono già stato bannito. Non ho nemmeno riceuto una risposta quando mi sono chiesto di riottenere il accesso. Ora che hanno introdotto funzionalità professionali, continuo ad utilizzare Claude CLI. Mi chiedo se tra voi sviluppatore esperti esista una soluzione di reverse proxy stab |
严重
[准确性]
"In recent months, I've been using"
理由: Incorrect: '前阵子' means 'a while ago' or 'recently', not 'in recent months' | Source says '前阵子' (a while ago/recently), 'In recent months' is slightly off but acceptable. Main issue is the format. | Source says '前阵子' (a while ago), not 'recent months'; changes the time frame. |
| google/gemma-3-12b-it | Title: How serious is banning now when using reverse proxies for kiro? Content: I used an open-source tool from GitHub to reverse proxy kiro a while ago, and got banned after less than two hours. My unban request was also denied. Now that I have pro, I still prefer Claude CLI. I would like to ask experienced users if there are any stable reverse proxies that don’t get banned. I really don’t like this client. | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "Is the kiro reverse proxy currently being blocked seriously?", "zh-TW": "最近使用 GitHub 开源工具的反代 kiro,还没用两小时就封号了,申请解封也不给解。", "ja": "今頃 github 上のオープンソースツールによる反代 kiro が、現在も封鎖されているかどうかわからない。", "ko": "최근 github에서 오픈소스 도구를 사용한 반대 피싱이 현재까지 제보되지 않아, 해제 신청도 안 되는 것 같습니다.", "de": "Derzeit wird das kiro Reverse Proxy ernsthaft blockiert sein?“, "fr": "Actuellement, le reverse proxy kiro est-il bloqué sérieusement ?", "it": "Attualmente, il reverse proxy kiro è bloccato seriamente?" } <|im_end|> | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: Is kiro reverse proxy currently banned severely? Content: I used an open source tool from GitHub to reverse proxy kiro a few days ago and got banned within two hours. My unban request was also denied. Now I have pro, but still like using claude CLI. I want to ask seniors, is there a stable reverse proxy that won’t get banned? I really don't like this client. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Is using a reverse proxy for kiro currently resulting in frequent bans? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Is kiro reverse proxy currently banned? |
严重
[准确性]
"Title: Is kiro reverse proxy currently banned?"
理由: Missing content translation. Only title was translated, entire content is missing. | The translation is too vague compared to the reference. | “封号严重吗” is about severity/frequency of account bans, not about the reverse proxy itself being banned. The nuance of account bans is lost. | The source asks if account bans are 'severe' (严重), not if the proxy itself is 'banned'. The hypothesis changes the meaning of the question. Also, the entire content section of the source text is missing from the hypothesis. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of 'kiro反代现在封号严重吗?' in en", "zh-TW": "Translation of 'kiro反代现在封号严重吗?' in zh-TW", "ja": "Translation of 'kiro反代现在封号严重吗?' in ja", "ko": "Translation of 'kiro反代现在封号严重吗?' in ko", "de": "Translation of 'kiro反代现在封号严重吗?' in de", "fr": "Translation of 'kiro反代现在封号严重吗?' in fr", "it": "Translation of 'kiro反代现在封号严重吗?' in it} | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Title: Is Kiro Reverse Proxy Seriously Being Blocked Now? |
严重
[准确性]
"Is Kiro Reverse Proxy Seriously Being Blocked Now?"
理由: The hypothesis only translates the title and completely omits the entire content paragraph provided in the source. | Should be 'Are Kiro reverse proxy account bans severe right now?' - the hypothesis changes the meaning from account bans to being blocked | Content translation is missing entirely in hypothesis; only title is provided |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: My design of the architecture of the Dragon虾 agent | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Content: A single openclaw gateway, multiple agents. One main agent, one research agent, one encoding agent, one testing agent, and one home agent, which can be added later as needed. |
轻微
[术语]
"encoding agent"
理由: Reference uses "coding agent"; "encoding" suggests a different technical role than general coding. | '编码' in this context refers to 'coding', not 'encoding'. | Reference uses 'coding agent' not 'encoding agent' | Should be 'coding agent' not 'encoding agent'. '编码' means coding/programming, not encoding. |
| Qwen/Qwen2.5-14B-Instruct | Title: My Lobster Agent Architecture Design Content: A single openclaw gateway, multiple agents. One main agent, one research agent, one coding agent, one testing agent, and one home agent. Additional agents can be added as needed. |
严重
[术语]
"Lobster Agent"
理由: 'Lobster' should be OpenClaw, the specific product name, not the literal animal. | Failed to recognize '龙虾' as the slang for 'OpenClaw', translating it literally instead. | Should be 'OpenClaw' instead of 'Lobster Agent'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Lobster' | Should be 'OpenClaw agent architecture design'. 'Lobster' is slang; the proper term is 'OpenClaw'. |
| Qwen/Qwen2.5-7B-Instruct | Title: My LinuxDo Lobster Agent Architecture Design Content: Single openclaw gateway, multiple agents. One main agent, one research agent, one encoding agent, one testing agent, one home agent, more can be added as needed. |
严重
[准确性]
"My LinuxDo Lobster Agent Architecture Design"
理由: Should be 'My OpenClaw agent architecture design' not 'LinuxDo Lobster'. LinuxDo is the platform name, not part of the product name. | 原文指“我的龙虾agent架构设计”,对应 OpenClaw/小龙虾,与 LinuxDo 无关;添加“LinuxDo”是无中生有的信息。 | Hallucinated 'LinuxDo' and mistranslated '龙虾' (slang for OpenClaw) as 'Lobster' instead of using the proper noun 'OpenClaw'.
轻微
[术语]
"encoding agent"
理由: Should be 'one coding agent' not 'encoding agent'. The source says '编码agent' which means coding. | '编码' in this context refers to 'coding', not 'encoding'. | 原文“编码agent”应译为“coding agent”,encoding 与软件开发语境的既定说法不符。 | Should be 'coding agent' not 'encoding agent' |
| Qwen/Qwen3-14B | Progettazione dell'architettura degli agenti OpenClaw |
致命
[准确性]
"Progettazione dell'architettura degli agenti OpenClaw"
理由: Severe omission. The hypothesis only translates the title, completely missing the content which lists the specific agent architecture (main, research, coding, testing, smart home agents). | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | Translates only the title and omits the main content listing multiple specific agents (main, research, coding, testing, smart home) and ability to add more. |
| Qwen/Qwen3.5-4B | Title: My Lobster Agent Architecture Design Content: A single openclaw gateway, multiple agents. One main agent, one research agent, one coding agent, one testing agent, one home agent, and additional agents can be added as needed later. |
严重
[术语]
"My Lobster Agent Architecture Design"
理由: Mistranslation of the slang term '龙虾' (Lobster) which refers to 'OpenClaw' in this context, as evidenced by the source mentioning 'openclaw gateway' and the reference using 'OpenClaw'. | Incorrect terminology: 'Lobster' should be 'OpenClaw' as per reference and source context (小龙虾 = OpenClaw) | Should be 'OpenClaw' instead of 'Lobster'. |
| Qwen/Qwen3.5-9B | Title: My Lobster Agent Architecture Design Content: A single OpenClaw gateway, multiple agents. One main agent, one research agent, one coding agent, one testing agent, one home agent, with additional agents added as needed. |
轻微
[准确性]
"one home agent"
理由: Should be 'smart home agent' per reference | Should be 'one smart home agent' to match reference and be more specific | Context suggests 'smart home agent' (as in reference), 'home agent' is slightly less precise. | Source says “家居agent” which corresponds better to “smart home agent”; “home agent” is less precise.
严重
[术语]
"Lobster Agent"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Lobster' | Should be translated as 'OpenClaw' instead of 'Lobster' to maintain consistency with the reference. | Failed to recognize '龙虾' as the specific product name 'OpenClaw', translating it literally as 'Lobster'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | My OpenCLaw Agent Architecture Design | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: My Lobster Agent Architecture Design Content: A single openclaw gateway and multiple agents. One main agent, one research agent, one coding agent, one testing agent, and one home agent, which can be added as needed. |
严重
[术语]
"Lobster Agent"
理由: Failed to recognize '龙虾' as the project name 'OpenClaw', translating it literally as 'Lobster'. | Should be 'OpenClaw' instead of 'Lobster' | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Lobster' | Should be 'OpenClaw agent architecture design' not 'Lobster Agent Architecture Design' | Product/ project name 'OpenClaw' should not be translated as 'Lobster'; this obscures the specific tool being discussed.
轻微
[准确性]
"one home agent"
理由: Translated '家居 agent' as 'home agent' instead of the more precise 'smart home agent'. | Should be 'one smart home agent' to be more specific | Should be 'smart home agent' as in reference |
| google/gemma-3-1b-it | Single openclaw gateway, multiple agents. |
严重
[准确性]
"Single openclaw gateway, multiple agents."
理由: Hypothesis is incomplete and only translates the first sentence. It omits the detailed list of agents (main agent, research agent, coding agent, testing agent, home agent) and the ability to add more agents on demand. | The hypothesis omits the 'Title' translation entirely and only provides a fragment of the 'Content'. It misses the specific list of agents (main, research, coding, testing, smart home) and the note about adding agents on demand. | Only partially translates the content; omits the list of specific agents (main, research, coding, testing, smart home) and the ability to add more on demand. |
| google/gemma-3-4b-it | Title: My Lobster Agent Architecture Content: A single openclaw gateway, multiple agents. One main agent, one research agent, one coding agent, one testing agent, and one home agent. Agents can be added as needed. |
严重
[术语]
"My Lobster Agent Architecture"
理由: Should be 'OpenClaw' not 'Lobster'. The source uses '龙虾' as a nickname for OpenClaw, but the proper term should be used in translation. | Source context is about the OpenClaw framework; “龙虾” here is product-related slang for OpenClaw, not literal lobsters. Should keep “OpenClaw” in the title to match established terminology. | Incorrect translation - 'Lobster' should be 'OpenClaw' (小龙虾 is a nickname for OpenClaw, not literal lobster) | Translates the slang '龙虾' literally as 'Lobster' instead of the established project name 'OpenClaw' used in the reference and context.
轻微
[准确性]
"home agent"
理由: Source has “家居agent”, which in this context refers to smart home; “smart home agent” is more accurate than generic “home agent”. | Loses the specific nuance of '家居' (smart home) present in the source and reference ('smart home agent'). | The term 'home agent' is missing and should be 'smart home agent' to accurately represent the content. |
| google/translategemma-12b-it | Title: My openclaw Agent Architecture Design Content: A single openclaw gateway and multiple agents. One main agent, one research agent, one coding agent, one testing agent, and one home automation agent. More agents can be added as needed. |
轻微
[术语]
"home automation agent"
理由: Source uses "家居agent" / reference "smart home agent"; "home automation" is understandable but less aligned with the established term. | While accurate, 'smart home agent' aligns better with the source '家居' in the context of consumer AI agents. | Should be 'smart home agent' to match reference terminology | Incorrect translation - reference uses 'smart home agent', not 'home automation agent' |
| google/translategemma-4b-it | Title: My Lobster Agent Architecture Content: A single openclaw gateway, multiple agents. One main agent, one research agent, one coding agent, one testing agent, and one home agent. Additional agents can be added as needed. |
轻微
[准确性]
"one home agent"
理由: Should be 'smart home agent' per reference. | Source '家居' refers to 'smart home' (as seen in reference), 'home agent' is slightly less precise. | Should be 'one smart home agent' to match the reference and better convey '家居agent'. |
| tencent/HY-MT1.5-1.8B | Translation of '我的龙虾agent架构设计' in en |
致命
[准确性]
"Translation of '我的龙虾agent架构设计' in en"
理由: The translation is not provided and only includes a placeholder. | This is only a placeholder phrase and not a translation; it omits all concrete content about the OpenClaw gateway and multiple agents. | Hypothesis is metadata/template text rather than actual translation. Should provide the actual English translation of the title and content. | Hypothesis is a placeholder template rather than actual translation; missing content translation | The hypothesis outputs a meta-instruction string describing the task instead of performing the translation of the title and content. |
| tencent/HY-MT1.5-7B | My LinuxAgent Architecture Design |
致命
[准确性]
"My LinuxAgent Architecture Design"
理由: Hallucination/Mistranslation. The source says '龙虾' (Lobster/OpenClaw), not 'Linux'. The reference correctly identifies it as 'OpenClaw'. | Should be 'My OpenClaw agent architecture design' - completely wrong product name, LinuxAgent is not mentioned in source | Completely wrong translation: 'OpenClaw' was mistranslated as 'LinuxAgent', which is a brand name error | Completely wrong key term; 'OpenClaw agent' is replaced by 'LinuxAgent', changing the subject of the title. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial |
严重
[准确性]
"sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial"
理由: Missing Title and Content - only the API key is present, reference has full Title and Content with proper translation of '爽蹬' to 'Spam/freeride' | Omission of the Title and most of the Content. The output is just an API key, losing all context. | Hypothesis only reproduces the key string and omits the surrounding title and API/model list content, so it is not a translation of the source. |
| CohereLabs/tiny-aya-water | sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial |
致命
[准确性]
"sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial"
理由: Missing URL and model list from the source. | Only outputs the key and omits title, tips, URL, and model list present in source; severe content omission. | Hypothesis only contains the API key without the title, URL, or model list. Should include full content from source. | Only outputs the API key, missing entire Title and Content fields from source | Severe omission. The hypothesis only contains the API key, missing the Title and the rest of the Content (URL, model list). |
| Qwen/Qwen2.5-14B-Instruct | [Mo API] Codex Smooth Ride - 20260306 |
轻微
[准确性]
"Codex Smooth Ride"
理由: Title completely mistranslated - '爽蹬' means 'Spam/freeride' not 'Smooth Ride', missing 'Content' field entirely | Mistranslated the slang '爽蹬'. In this context regarding API keys, it implies 'freeriding' or 'spamming', not a pleasant ride. | Incorrect translation of '爽蹬'. Should be 'Spam/freeride Codex happily' or similar. The hypothesis completely misses the meaning. | Misinterprets slang '爽蹬' which here means something like 'spam/freeride (use for free/happily)', not 'smooth ride'. |
| Qwen/Qwen2.5-7B-Instruct | [Mo API] Codex Enjoy - 20260306 |
轻微
[准确性]
"Codex Enjoy"
理由: Omission of content and incorrect translation of '爽蹬' as 'Enjoy' instead of 'Spam/freeride'. | Mistranslation of '爽蹬' which implies freeriding or abusing the API, not simply 'enjoying' it. | Should be 'Spam/freeride Codex happily' or similar. '爽蹬' is slang meaning to enjoy/take advantage, not simply 'Enjoy'. | Title incorrectly translated as 'Enjoy' when reference says 'Spam/freeride'; Content missing entirely |
| Qwen/Qwen3-14B | [Mo API] Codex Scandale - 20260306 |
致命
[准确性]
"Codex Scandale"
理由: Incorrect translation of '爽蹬' as 'Scandale' instead of 'Spam/freeride' | ‘爽蹬’ conveys happily/spam/freeride usage; ‘Scandale’ suggests scandal and changes meaning. | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. Also 'Scandale' does not accurately convey the meaning of '爽蹬'. | Mistranslation of '爽蹬'. In this context, '爽蹬' is slang implying a smooth, enjoyable, or freeride experience. 'Scandale' (scandal) carries a negative connotation that contradicts the source's positive/promotional tone. | Critical mistranslation: '爽蹬' means 'happily/freeride' (as in taking advantage of free resources), not 'Scandale' (scandal). This completely changes the meaning. |
| Qwen/Qwen3.5-4B | "en": "[Mo API] Codex Shuang Deng - 20260306\nContent: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "zh-TW": "[Mo API] 爽蹬 Codex - 20260306\n內容:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ja": "[Mo API] コデックス シュアンテン - 20260306\nコンテンツ:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ko": "[Mo API] 코덱스 쑹덴 - 20260306\n콘텐츠: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "de": "[Mo API] Codex Schuang Deng - 20260306\nInhalt: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "fr": "[Mo API] Codex Shuang Deng - 20260306\nContenu : [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "it": "[Mo API] Codex Shuang Deng - 20260306\nContenuto: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1 |
严重
[准确性]
"Codex Shuang Deng"
理由: Failure to translate the slang '爽蹬' (Shuang Deng), which implies spamming or freeriding, into meaningful English as done in the reference ('Spam/freeride Codex happily'). | Incorrect transliteration 'Shuang Deng' vs reference meaning | Incorrect transliteration of '爽蹬'. Reference indicates this should be translated as 'Spam/freeride' or similar, not transliterated |
| Qwen/Qwen3.5-9B | "en": "[Mo API] Codex Shuangdeng - 20260306\nContent: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "zh-TW": "[Mo API] Codex 爽蹬 - 20260306\n內容:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ja": "[Mo API] Codex 爽蹬 - 20260306\nコンテンツ:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ko": "[Mo API] Codex 爽蹬 - 20260306\n콘텐츠: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "de": "[Mo API] Codex Shuangdeng - 20260306\nInhalt: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "fr": "[Mo API] Codex Shuangdeng - 2020260306\nContenu : [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "it": "[Mo API] Codex Shuangdeng - 20260306\nContenuto: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5. |
严重
[准确性]
"Shuangdeng"
理由: The hypothesis contains a JSON structure with multiple languages instead of a direct translation. 'Shuangdeng' is a transliteration of slang '爽蹬' (freeride/spam) which should be translated for meaning. | Incorrect translation of '爽蹬' - should be 'happily' or 'freeride' per reference, not 'Shuangdeng' | Leaves “爽蹬” as pinyin/romanization instead of conveying the slang sense of happily spamming/freeriding Codex. | Incorrect translation of '爽蹬' - reference translates as 'Spam/freeride Codex happily' which better captures the slang meaning |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Mo API] Codex 爽蹬 - 20260306 |
严重
[准确性]
"爽蹬"
理由: Left untranslated; should convey idea of spamming/freeriding or similar slang meaning. | The slang term '爽蹬' (implying free ride/spam/abuse) was not translated, leaving Chinese characters in the English output. | Should be translated as 'Spam/freeride' or similar, not left untranslated. The reference shows this should be interpreted as slang. | Incorrect translation: '爽蹬' is slang meaning 'spam/freeride', not a direct translation. Should be 'Spam/freeride' as in reference |
| google/gemma-3-12b-it | "en": "Title: [Mo API] Codex Speedy - 20260306\nContent: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "zh-TW": "標題:[Mo API] Codex 爽蹬 - 20260306\n內容:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ja": "タイトル:[Mo API] Codex 爽蹬 - 20260306\nコンテンツ:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ko": "제목: [Mo API] Codex 爽蹬 - 20260306\n내용: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "de": "Titel: [Mo API] Codex 爽蹬 - 20260306\nInhalt: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "fr": "Titre : [Mo API] Codex 爽蹬 - 20260306\nContenu : [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "it": "Titolo: [Mo API] Codex 爽蹬 - 20260306\nContenuto: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNj |
严重
[准确性]
"Codex Speedy"
理由: 『爽蹬』is slang closer to 'spam/freeride happily'; 'Speedy' changes the meaning away from free-riding / abusing and toward speed. | Incorrect translation of '爽蹬' - reference says 'Spam/freeride' but hypothesis says 'Speedy', completely wrong meaning | The translation of '爽蹬' is unclear and doesn't match reference 'Spam/freeride Codex happily'. The hypothesis uses 'Speedy' which is inaccurate. |
| google/gemma-3-1b-it | Translation of 'Mo API' with Codex 爽蹬 - 20260306 |
轻微
[准确性]
"Translation of 'Mo API' with Codex 爽蹬 - 20260306"
理由: '爽蹬' is not translated; the meaning of happily spamming/freeriding Codex is missing, and the bracketed tag style from the title is altered. | The hypothesis is a meta-description stating 'Translation of...' rather than providing the actual translation. It fails to translate the slang term '爽蹬' (likely meaning spam/freeride or working smoothly depending on context, reference uses 'Spam/freeride') and omits the content entirely. | Hypothesis does not provide actual translation. Reference translates '爽蹬' as 'Spam/freeride' and provides full title translation, while hypothesis only describes what it is translating. | The hypothesis is a partial translation and does not capture the full meaning of the source. |
| google/gemma-3-4b-it | "en": "Title: [Mo API] Codex 爽蹬 - 20260306\nContent: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "zh-TW": "標題:[Mo API] Codex 爽蹬 - 20260306\n內容:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ja": "タイトル:[Mo API] Codex 爽蹬 - 20260306\nコンテンツ:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ko": "제목: [Mo API] Codex 爽蹬 - 20260306\n내용: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "de": "Titel: [Mo API] Codex 爽蹬 - 20260306\nInhalt: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "fr": "Titre : [Mo API] Codex 爽蹬 - 20260306\nContenu : [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "it": "Titolo: [Mo API] Codex 爽蹬 - 20260306\nContenuto: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd47 |
严重
[准确性]
"Codex 爽蹬"
理由: Mistranslation of '爽蹬' - reference shows 'Spam/freeride' but hypothesis keeps Chinese characters unchanged | The hypothesis fails to translate the text, merely repeating the source Chinese characters and adding JSON formatting keys for multiple languages instead of providing the requested Chinese translation. | Left untranslated in hypothesis. Reference translates as 'Spam/freeride happily'. | Leaves “爽蹬” untranslated; reference clarifies it as something like spam/freeride/happily using. The key slang meaning is missing in the English part. |
| google/translategemma-12b-it | "en": "Title: [Mo API] Codex Quick Start - 20260306\nContent: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "zh-TW": "標題:[Mo API] Codex 快速入門 - 20260306\n內容:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ja": "タイトル:[Mo API] Codex クイックスタート - 20260306\n内容:[!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ko": "제목: [Mo API] Codex 빠른 시작 - 20260306\n내용: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "de": "Titel: [Mo API] Codex Schnellstart - 20260306\nInhalt: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "fr": "Titre : [Mo API] Codex Démarrage rapide - 20260306\nContenu : [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "it": "Titolo: [Mo API] Codex Guida introduttiva - 20260306\nContenuto: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKN |
严重
[准确性]
"Codex Quick Start"
理由: The source slang '爽蹬' implies abusing the API for free/unlimited use (freeriding/spamming). 'Quick Start' completely misinterprets the illicit nature of the post. | Source slang "爽蹬" conveys free-riding/spamming happily, not "Quick Start"; the meaning is significantly altered. | Incorrect translation - reference says 'Spam/freeride Codex happily', not 'Codex Quick Start' | Mistranslation of '爽蹬'. Should be 'Spam/freeride Codex happily' not 'Codex Quick Start' |
| google/translategemma-4b-it | "en": "Title: [Mo API] Codex Smooth Ride - 20260306\nContent: [!tip] https://api.268526.eu.cc/v1 [!tip] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!tip] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "zh-TW": "標題:[Mo API] Codex 暢行無阻 - 20260306\n內容:[!提示] https://api.268526.eu.cc/v1 [!提示] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!提示] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ja": "タイトル:[Mo API] Codex スムーズな走行 - 20260306\nコンテンツ:[!ヒント] https://api.268526.eu.cc/v1 [!ヒント] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!ヒント] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "ko": "제목: [Mo API] Codex 부드러운 주행 - 20260306\n내용: [!팁] https://api.268526.eu.cc/v1 [!팁] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!팁] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "de": "Titel: [Mo API] Codex Sanfter Lauf - 20260306\nInhalt: [!Hinweis] https://api.268526.eu.cc/v1 [!Hinweis] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!Hinweis] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "fr": "Titre : [Mo API] Codex Conduite Fluide - 20260306\nContenu : [!Conseil] https://api.268526.eu.cc/v1 [!Conseil] sk-N0RtVvKqubIOrDifDreyBRKNjK16KeCd4726rzmzy1qroial[!Conseil] gpt-5,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.2,gpt-5.2-codex,gpt-5.3-code", "it": "Titolo: [Mo API] Codex Guida Fluida - 20260306\nContenuto: [!Suggerimento] https://api.268526.eu.cc/v1 [!Suggerimento] sk |
严重
[准确性]
"Codex Smooth Ride"
理由: Source title slang “爽蹬” indicates carefree/free-riding or abusing a service; “Smooth Ride” misses the ‘freeride/spam’ nuance mirrored in the reference. | Inaccurate translation of '爽蹬'. Reference shows 'Spam/freeride Codex happily' which better captures the slang meaning. 'Smooth Ride' misses the intended meaning. | The slang '爽蹬' in this context (API sharing) implies abusing the service or 'freeriding/spamming', not a 'smooth ride'. The translation misses the negative/connotation of abuse. | Source has '爽蹬' meaning 'Spam/freeride', hypothesis incorrectly translates as 'Smooth Ride'. |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: The hypothesis fails to translate the source text (which contains API keys and model lists), instead outputting a meta-string about translating the word 'Login'. | Hypothesis is unrelated to source. Source is about API endpoints and model names, not login (登录). Completely incorrect translation target. | Complete mistranslation - source is about Codex API spam/freeride, hypothesis translates '登录' (login) which is completely unrelated | Placeholder text; it neither translates the given title nor matches the reference about Mo API and Codex models. |
| tencent/HY-MT1.5-7B | Title: [Mo API] Codex Suede - 20260306 |
轻微
[准确性]
"Codex Suede"
理由: Incorrect translation: '爽蹬' (happy ride/freeride) was translated as 'Suede' which is completely wrong; reference uses 'Spam/freeride Codex happily' | Misinterprets slang '爽蹬' (spam/freeride/happily abuse); 'Suede' is semantically unrelated. | Severe mistranslation of slang. '爽蹬' (shuǎng dēng) is slang for using something freely/happily or a freeride. 'Suede' is a phonetic hallucination with no semantic connection. | Should be 'Spam/freeride Codex happily' or similar - 'Suede' is incorrect translation of '爽蹬' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Problemi con la configurazione del sito pubblico Elysiver |
致命
[准确性]
"Problemi con la configurazione del sito pubblico Elysiver"
理由: Translation is in Italian, not English. Should be in English as per target language context. Should be 'Configuration issue with Elysiver free API proxy site'. | Wrong target language. The output is in Italian, but the target language is Chinese (zh). Also omits the Content section. | Wrong target language - hypothesis is in Italian instead of English (target language is zh, but source is Chinese and reference is English) | Misses "公益" nuance (public/free proxy/API) and reference to configuration "到cc上" (onto Claude Code or similar platform). |
| CohereLabs/tiny-aya-water | Title: Issues with Elysiver's Public Utility Station Configuration Content: Has anyone encountered this issue with Elysiver's Public Utility Station configuration on cc? |
严重
[术语]
"Public Utility Station"
理由: Incorrect translation - '公益站' means 'free API proxy site' not 'Public Utility Station' | Mistranslation of "公益站" in this context; it refers to a free API/proxy site, not a public utility station. | Should be 'free API proxy site' not 'Public Utility Station'. '公益站' refers to a free/public service proxy site. | Mistranslation of '公益站'. In this context, it refers to a 'free community/proxy site', not a public utility. | Incorrect translation of '公益站' as 'Public Utility Station' instead of 'free API proxy site'.
轻微
[准确性]
"on cc"
理由: Does not clarify that "cc" is Claude Code as in reference; slightly under-specific. | Incorrect translation - reference clarifies 'Claude Code' not 'cc' | Failed to expand slang 'cc' to 'Claude Code' as understood in context. | Missing context. Should reference 'Claude Code' (cc) and include the informal 'bros' tone from source ('佬们'). |
| Qwen/Qwen2.5-14B-Instruct | Title: Configuration Issue with Elysiver Free Shared API Site Content: Has anyone configured the Elysiver free shared API site on cc and encountered this issue? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Configuration Issues with Elysiver Free API Content: Has anyone encountered this problem when configuring Elysiver Free API on cc? |
轻微
[术语]
"Free API"
理由: Should be 'Configuration issue with Elysiver free API proxy site' to match reference more closely | Should be 'Elysiver free API proxy site' to match reference | Partial translation of '公益站' as 'Free API' instead of 'free API proxy site'. | 原文“公益站”更对应“free/public proxy site/service”;直译为“Free API”略失去“站/站点”含义。 |
| Qwen/Qwen3-14B | Title: Problem with Elysiver free shared API site configuration Content: Has anyone encountered this issue when configuring Elysiver free shared API site on cc? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Elysiver Free Shared API Site Configuration Issue Content: I've configured Elysiver free shared API site on cc. Have you encountered this issue? |
严重
[准确性]
"on cc"
理由: Failure to expand the slang/abbreviation 'cc' to 'Claude Code' as identified in the reference. 'cc' alone is ambiguous in English. | “cc” in source is shorthand for Claude Code; hypothesis keeps an unexplained “cc” and omits the explicit Claude Code reference given in the reference. | Incorrect: 'cc' should be 'Claude Code' as per reference |
| Qwen/Qwen3.5-9B | Title: Issues with configuring Elysiver free shared API site Content: I've configured Elysiver free shared API site on cc. Have any of you encountered this issue? |
轻微
[准确性]
"free shared API site"
理由: Should be 'free API proxy site' to match reference terminology | 'cc' should be 'Claude Code' per reference | “公益站” here is a free API proxy site rather than general “shared API site”; misses the “proxy” nuance. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Elysiver Free API Configuration Issues |
严重
[准确性]
"Elysiver Free API Configuration Issues"
理由: The hypothesis only translates the title and omits the entire content section asking about the issue on 'cc' (Claude Code). | Should be 'Configuration issue with Elysiver free API proxy site' - more specific and accurate to source meaning | '公益站' refers to a free/public proxy site, not just 'Free API'; missing 'proxy site' aspect and 'on cc' configuration detail from content. | Incorrect: '公益站' means 'free API proxy site', not just 'Free API'. '公益' means 'public welfare/free' in this context |
| google/gemma-3-12b-it | Title: Issues with Elysiver Public Proxy Configuration Content: I configured Elysiver public proxy on cc. Has anyone encountered this issue? |
轻微
[术语]
"Public Proxy"
理由: 『公益站』here is a free API proxy site; 'public proxy' is understandable but less precise than 'free API proxy site' as in the reference. | Should be 'Elysiver free API proxy site' instead of 'Elysiver Public Proxy' | Translated '公益站' as 'Public Proxy' instead of 'free API proxy site' or 'charity site', missing the 'free' nuance.
严重
[准确性]
"on cc"
理由: Failed to expand the abbreviation 'cc' to 'Claude Code' based on context. | Source specifies '配置到cc上' meaning configured onto Claude Code; 'cc' is vague in English and may not be recognized as Claude Code without expansion. | Missing context. Should be 'When configuring the Elysiver free API proxy site on Claude Code' to match reference | Should be 'Claude Code' not 'cc' - unclear abbreviation |
| google/gemma-3-1b-it | Elysiver公益站配置到cc上,佬们遇到过这个问题吗? |
致命
[准确性]
"Elysiver公益站配置到cc上,佬们遇到过这个问题吗?"
理由: Hypothesis is the source text itself, not a translation. No translation to English was provided. | The hypothesis is identical to the source, but the reference is in English. The hypothesis does not match the reference. | The hypothesis copies the source text verbatim instead of translating it into the target language (which appears to be English based on the reference, despite the metadata saying Target Language: zh, the source is zh and reference is en, implying the task is zh->en. If the task was zh->zh, copying would be acceptable, but the reference indicates an English translation is expected. Given the pattern of other errors, the model failed to translate). Assuming the task is Source(ZH) -> Target(EN) based on Reference: The hypothesis failed to translate. |
| google/gemma-3-4b-it | Elysiver Public Proxy Configuration Issue |
严重
[准确性]
"Elysiver Public Proxy"
理由: “公益站” here is a free/public API proxy site, not simply “public proxy”; loses the API/proxy site nuance present in reference. | Incomplete translation - only title translated, content missing entirely | The term 'Public Proxy' is too generic and does not accurately represent the specific product 'Elysiver公益站'. | The hypothesis only translates the title and completely omits the content question regarding 'cc' (Claude Code) and the address to '佬们' (bros/experts). |
| google/translategemma-12b-it | Title: Issues with Elysiver Public Proxy Configuration |
严重
[准确性]
"Public Proxy"
理由: "公益站" in this context is closer to "free API proxy site" or "public/free site"; "proxy" is missing and may mislead about the nature of the service. | Incorrect translation - reference says 'Configuration issue with Elysiver free API proxy site' | Incomplete translation. Should be 'Configuration issue with Elysiver free API proxy site' to match reference | The hypothesis omits the entire content section of the source text, which contains the core question about configuring on 'cc' (Claude Code). |
| google/translategemma-4b-it | Title: Troubleshooting Elysiver Public API Configuration |
严重
[准确性]
"Public API"
理由: “公益站” here refers to a free/pro bono proxy site, not specifically a public API. The reference indicates ‘free API proxy site’; reducing it to ‘Public API’ omits ‘site’ and ‘proxy’ aspects. | Inaccurate translation. Should be 'Configuration issue with Elysiver free API proxy site' to match reference. 'Troubleshooting' adds interpretation not in source. | The translation is accurate but could be more precise by including 'free' and 'proxy' as in the reference. | Should be 'Configuration issue with Elysiver free API proxy site' per reference. Hypothesis adds 'Troubleshooting' which is not in source. | The hypothesis omits the entire content section of the source text. '公益站' is translated as 'Public API' but reference suggests 'free API proxy site', which is more accurate in this context. 'cc' is not translated/explained (Reference: Claude Code). |
| tencent/HY-MT1.5-1.8B | Translation of 'Elysiver公益站配置的问题' in en |
致命
[准确性]
"Translation of 'Elysiver公益站配置的问题' in en"
理由: The translation is not provided and only includes a placeholder. | Meta description instead of a concrete translation; misses the Elysiver free proxy site configuration context. | The hypothesis outputs a meta-instruction string instead of the actual translation of the title and content regarding Elysiver configuration. | Hypothesis is a placeholder template rather than actual translation; missing content translation | Hypothesis is metadata/template text rather than actual translation. Should provide actual translated content. |
| tencent/HY-MT1.5-7B | Title: Issues with Elysiver Public Welfare Site Configuration |
轻微
[术语]
"Public Welfare Site"
理由: '公益站' (free API proxy site) is translated as 'Public Welfare Site' which is a literal translation; reference uses 'free API proxy site' | Literal rendering; context is a free API proxy site. 'Free API proxy site' or similar would better match domain usage. | Literal translation of '公益站'. In this context, it refers to a 'free API proxy site' or community resource, not a charitable organization. | Should be 'Configuration issue with Elysiver free API proxy site' - more specific terminology |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: I spent an entire night writing an OpenClaw plugin for small shrimp, used to configure third-party proxy APIs as food for small shrimp through WebUI |
严重
[术语]
"small shrimp"
理由: Mistranslation of '小龙虾' which refers to the specific tool 'OpenClaw', not literal small shrimp. | Incorrect translation of '小龙虾' as 'small shrimp' instead of 'OpenClaw'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'small shrimp' | Should be 'OpenClaw' not 'small shrimp'. '小龙虾' is slang for OpenClaw in this context. |
| CohereLabs/tiny-aya-water | Title: I spent an entire night writing an OpenClaw plugin for small shrimp, which uses WebUI to configure third-party transfer APIs as food for small shrimp. |
严重
[准确性]
"plugin for small shrimp"
理由: "小龙虾" is a nickname for OpenClaw itself, not literal small shrimp; plugin is for OpenClaw, not for shrimp. | Reference says 'to feed OpenClaw' not 'as food for small shrimp' | Mistranslation of '小龙虾' (OpenClaw) as literal 'small shrimp'.
严重
[术语]
"small shrimp"
理由: Should be 'OpenClaw' not 'small shrimp'. '小龙虾' is slang for OpenClaw in this community context. | Incorrect translation of '小龙虾' as 'small shrimp' instead of 'OpenClaw'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'small shrimp' |
| Qwen/Qwen2.5-14B-Instruct | Title: "Self-Introduction for Open Source" Spent all night writing an OpenClaw lobster plugin to configure third-party relay APIs as lobster feed via WebUI Content: The popularity of OpenClaw lobster has been increasing, ranking first on Github, but there are many newbies among its users... |
严重
[术语]
"lobster plugin"
理由: Should be 'OpenClaw plugin'. 'Lobster' is slang; proper terminology is 'OpenClaw'. | Should use the product name OpenClaw, not literal 'lobster', for clarity and consistency. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'lobster' | Literal translation of '小龙虾插件' instead of using the recognized term 'OpenClaw plugin'. |
| Qwen/Qwen2.5-7B-Instruct | Title: 'Self-recommendation for Open Source' Stayed Up All Night to Write an OpenClaw Shrimp Lobster Plugin, Used to Configure Third-party Relay APIs via WebUI as Food for Shrimp Lobster Content: The popularity of OpenClaw Shrimp Lobster has been increasing instead of decreasing, ranking first on Github, but there are particularly many novice players in its user base... |
严重
[术语]
"Shrimp Lobster"
理由: Should be 'OpenClaw' not 'Shrimp Lobster' - incorrect terminology | Incorrect translation of '小龙虾' as 'Shrimp Lobster' instead of 'OpenClaw'. | Redundant translation. Should be just 'OpenClaw plugin' not 'Shrimp Lobster Plugin' | 原文“小龙虾”是 OpenClaw 的昵称,重复“Shrimp Lobster”既不自然也未体现项目名称;参考简化为 OpenClaw。 |
| Qwen/Qwen3-14B | Autopromozione: Ho passato tutta la notte a creare un plugin OpenClaw Lobster per configurare API di terze parti tramite WebUI come alimenti per i granchi |
致命
[准确性]
"plugin OpenClaw Lobster"
理由: Redundant/misleading; ‘OpenClaw小龙虾插件’ refers to an OpenClaw plugin, not a ‘Lobster’ plugin as separate term. | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | Severe omission. The hypothesis only translates the title (and poorly, mixing Lobster and Crab), completely missing the content about the hype, GitHub trending, and novice users. |
| Qwen/Qwen3.5-4B | Title: "Self-recommendation" Spent all night writing an OpenClaw lobster plugin to configure third-party proxy APIs as food for the lobsters via WebUI Content: The popularity of OpenClaw lobster keeps rising despite being on GitHub's top list, but there are especially many novice players among its user base... |
严重
[准确性]
"food for the lobsters"
理由: Literal translation of the metaphor '食料' (food) for 'lobsters', whereas the meaning is configuring APIs for the 'OpenClaw' tool. The reference correctly interprets this as 'to feed OpenClaw'. | Awkward phrasing; should be 'plugin to configure third-party proxy APIs via WebUI to feed OpenClaw' per reference | Metaphorical “小龙虾食料” means API input/usage for OpenClaw; literal ‘food for the lobsters’ obscures the technical metaphor compared with reference’s “to feed OpenClaw” which keeps branding clear. | Reference says 'to feed OpenClaw' not 'food for the lobsters' |
| Qwen/Qwen3.5-9B | Title: 'Self-recommended open source' spent all night writing an OpenClaw crayfish plugin to configure third-party relay APIs as crayfish feed via WebUI Content: The popularity of OpenClaw crayfish has been increasing rather than decreasing, ranking first on Github, but there are particularly many novice players among the user base... |
严重
[准确性]
"crayfish plugin"
理由: “小龙虾” is the OpenClaw nickname/brand; literal “crayfish” loses the product reference. | '小龙虾' (crayfish) is slang for OpenClaw and should be translated as such. 'crayfish feed' is incorrect metaphor; should be 'to feed OpenClaw' | Metaphor 'feed' is confusing in English without context; 'to feed OpenClaw' or 'as input for OpenClaw' is clearer.
严重
[术语]
"crayfish plugin"
理由: Should be 'OpenClaw plugin'. 'Crayfish' is a literal translation of the slang nickname. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'crayfish' | Should be translated as 'OpenClaw' instead of 'crayfish' to maintain consistency with the reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: "Open Source Self-Presentation" Wrote an OpenClaw Small Shrimp Plug-in All Night to Use Third-Party API Middleware Through Web Interface for Small Shrimp Food |
严重
[术语]
"Small Shrimp Plug-in"
理由: Incorrect translation of '小龙虾插件' as 'Small Shrimp Plug-in' instead of 'OpenClaw'. | '小龙虾' is a nickname for OpenClaw, not literal small shrimp; redundant and misleading when combined with 'OpenClaw'. | Should be 'OpenClaw plugin' not 'Small Shrimp Plug-in'. '小龙虾' is slang for OpenClaw. | Incorrect: '小龙虾' is slang for 'OpenClaw', should be 'OpenClaw' not 'Small Shrimp' |
| google/gemma-3-12b-it | Title: "Open Source Self-Recommendation" I stayed up all night writing an OpenClaw Lobster plugin to use third-party relay APIs as Lobster food through the WebUI Content: The popularity of OpenClaw Lobster has been steadily increasing, ranking first on Github, but there are especially many novice players among its user base... |
严重
[术语]
"Lobster plugin"
理由: The branded project is OpenClaw; adding 'Lobster' as part of the product name can confuse it with a generic concept rather than the specific framework. | Should be 'OpenClaw' instead of 'Lobster' | Redundant. Should be 'OpenClaw plugin' not 'OpenClaw Lobster plugin' | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Lobster' |
| google/gemma-3-1b-it | OpenClaw small shrimp popularity is constantly rising, it's been on the first place in GitHub's leaderboard, but the user group is particularly many players... | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: "Open Source Recommendation" Spent a whole night writing an OpenClaw crayfish plugin to configure third-party relay APIs as crayfish food through the WebUI. |
严重
[术语]
"crayfish"
理由: “小龙虾” is community nickname for OpenClaw; using literal “crayfish” in this technical context loses the product reference and established term. | Should be 'OpenClaw' not 'crayfish'. While '小龙虾' literally means crayfish, in this context it's a nickname for OpenClaw and should be translated as such. | Incorrect translation - 'crayfish' should be 'OpenClaw' (小龙虾 is a nickname for OpenClaw)
严重
[准确性]
"crayfish food"
理由: Metaphorical use is acceptable but may obscure that these are third-party proxy APIs used as model backends; slightly less clear than “feed OpenClaw” but meaning mostly preserved. | Translates the metaphor '食料' (food/fuel for the agent) too literally. In this context, it refers to API inputs or resources for OpenClaw, not actual biological food. | Awkward phrasing. Reference uses 'feed OpenClaw' which is clearer and more natural. |
| google/translategemma-12b-it | Title: "Open Source Self-Introduction" - I stayed up all night and wrote an OpenClaw plugin called "Little Lobster," which allows you to configure third-party relay APIs through the WebUI as food for OpenClaw. |
严重
[准确性]
"called "Little Lobster,""
理由: Misinterprets the syntax. The plugin is for 'OpenClaw Little Lobster' (the project), not named 'Little Lobster' itself. The source says 'OpenClaw 小龙虾插件' (OpenClaw [project] plugin). | Incorrect interpretation. Should be '[Open Source Self-Recommendation] Stayed up all night writing an OpenClaw plugin' not 'called Little Lobster' | In the source, OpenClaw is nicknamed "小龙虾"; the plugin is not named "Little Lobster". The hypothesis incorrectly renames the plugin and shifts the metaphor. |
| google/translategemma-4b-it | Title: "Open Source Recommendation" Spent a night writing an OpenClaw crayfish plugin to configure third-party relay APIs as crayfish food via WebUI. |
严重
[术语]
"crayfish plugin"
理由: “小龙虾” is the nickname for OpenClaw itself; the reference clarifies this is an OpenClaw plugin. ‘Crayfish plugin’ misinterprets it as a generic animal-related plugin instead of an OpenClaw plugin. | Should be 'OpenClaw plugin' not 'crayfish plugin' - OpenClaw is the brand name. | Inconsistent terminology. '小龙虾' is used as a slang term for OpenClaw throughout the community. Should use 'OpenClaw' or maintain consistency with 'lobster' not 'crayfish'. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '开源自荐' with LinuxDo community context.", "zh-TW": "開源自荐"的翻譯,在LinuxDo社區中。", "ja": "開源自荐"の翻訳、LinuxDoコミュニティ内で。", "ko": "開源自荐"의 번역, LinuxDo 커뮤니티에서。", "de": "Translation of '開源自荐' in de", "fr": "Translation of '開源自荐' in fr", "it": "Translation of '開源自荐' in it"} |
致命
[准确性]
""en": "Translation of '开源自荐' with LinuxDo community context."..."
理由: Hypothesis is metadata/template text with descriptions rather than actual translations. Should provide real translated content. | The hypothesis is a malformed JSON object containing translation prompts for multiple languages instead of the required Chinese-to-English translation of the source text. | The translation is not provided and only includes a placeholder. |
| tencent/HY-MT1.5-7B | "en": "Title: Creating an OpenClaw Lobster Plugin All Night to Configure Third-Party Transfer APIs via WebUI as Food for the Lobsters", "zh-TW": "Title: 花了一整晚制作OpenClaw龙虾插件,以便通过WebUI配置第三方转移API作为龙虾的食物", "ja": "Title: 一整晚都在制作OpenClaw龙虾插件,用于通过WebUI配置第三方转移API作为龙虾的食物", "ko": "Title: 하룻밤을 쏟아서 OpenClaw龙虾插件를 제작했으며, 이를 통해 WebUI를 사용하여 제3자 전송API를 설정하여龙虾의 먹이로 사용할 수 있다", "de":="Titel: Eine ganze Nacht lang wurde an der Entwicklung des OpenClaw-Lobster-Plugins gearbeitet, um über die WebUI Drittanbieter-Transfer-APIs als Futter für die Lobster konfigurieren zu können", "fr":="Titre: Une nuit entière a été consacrée à la création d’un plugin OpenClaw pour lobster, afin de pouvoir configurer des API de transfert de tiers via la WebUI en tant que nourriture pour les lobster", "it":="Titolo: Ho trascorso tutta la notte a creare un plugin per il lobster OpenClaw, al fine di configurare le API di trasferimento di terzi tramite la WebUI come cibo per i lobster"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Personal insights on systemizing AI development processes. GSD handles 'how to break down + how to parallelize', while Superpowers handle 'how to write + how to ensure quality'. They are actually complementary relationships - using GSD for overall planning and calling upon Superpowers' TDD skills during critical tasks is the current best practice. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Personal insights on systemizing AI development processes. GSD handles 'how to decompose + how to parallelize', while Superpowers handle 'how to write + how to ensure quality'. They are actually complementary relationships - using GSD for overall planning and calling Superpowers' TDD skills during key tasks is the current best practice | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | "en": "Title: Personal insights on systematic development processes for AI systems. GSD manages 'how to break down and parallelize', while Superpowers handles 'how to write and ensure quality'. They complement each other - using GSD for overall planning and invoking Superpowers' TDD skills in critical tasks is currently the best practice. Content: I've set up a more stable development process with GSD + Superpowers. My current relatively stable approach is: GSD manages 'how to break down and parallelize'...", "zh-TW": "標題:關於AI系統化的開發流程個人體會。GSD 負責「如何拆解和並行」,Superpowers 負責「如何寫作和確保品質」。兩者其實互補-使用 GSD 進行整體規劃,在關鍵任務中調用 Superpowers 的 TDD 技能,是目前最佳實踐。內容:我用 GSD + Superpowers 構建了一個更穩固的開發流程。我現在比較穩定的做法是:GSD 負責「如何拆解和並行」...", "ja": "タイトル:AIのシステム開発プロセスに関する個人的な洞察。GSDは『どのように分割し、並行処理するか』を管理し、Superpowersは『どのように書くか、そして品質を保証するか』を管理します。これらは補完関係にあります—全体計画のためにGSDを使用し、重要なタスクではSuperpowersのTDDスキルを呼び出すことが現在最良の実践です。コンテンツ:私はGSDとSuperpowersを使ってより安定した開発プロセスを作りました。私の現在の比較的安定したアプローチは:GSDが『どのように分割し、並行処理するか』を管理します...", "ko": "제목: AI 시스템 개발 프로세스에 대한 개인적 인사이트. GSD는 ‘어떻게 분할하고 병렬화하는지’를 관리하며, Superpowers는 ‘어떻게 작성하고 품질을 보장하는지’를 관리합니다. 이들은 상호보완적인 관계입니다 — 전체 계획을 위해 GSD를 사용하고 중요한 작업에서는 Superpowers의 TDD 기술을 호출하는 것이 현재 최선의 실천 방법입니다. 내용: 저는 GSD와 Superpowers를 사용하여 더 안정적인 개발 프로세스를 세웠습니다. 제 현재 비교적 안정된 접근 방식은: GSD가 ‘어떻게 분할하고 병렬화하는지’를 관리합니다...", "de": "Titel: Persönliche Einsichten zu systematischen Entwicklungsprozessen für KI-Systeme. GSD verwaltet 'wie man aufteilt und parallelisiert', während Superpowers 'wie man schreibt und Qualität gewährleistet' verwalten. Sie ergänzen sich gegenseitig – mit GSD wird das Gesamtkonzept geplant und in kritischen Aufgaben die TDD-Fähigkeiten von Superpowers eingesetzt, was derzeit die beste Praxis ist. Inhalt: Ich habe eine stabilere Entwicklungsumgebung mit GSD + Superpowers aufgebaut. Mein aktueller stabilster Ansatz ist: GSD verwaltet 'wie man aufteilt und parallelisiert'...", "fr": "Titre : Réflexions personnelles sur les processus de développement systématiques pour les systèmes d'intelligence artificielle. GSD gère « comment découper et paralléliser », tandis que Superpowers s'occupe de « comment écrire et garantir la qualité ». Ils sont complémentaires – utiliser GSD pour l'organisation globale et invoquer les compétences TDD de Superpowers dans les tâches clés est actuellement la meilleure pratique. Contenu : J'ai mis en place un processus de développement plus stable avec GSD + Superpowers. Ma méthode actuelle relativement stable est : GSD gère « comment découper et paralléliser »...", "it": "Titolo: Riflessioni personali sui processi di sviluppo sistematici per i sistemi di intelligenza artificiale. GSD gestisce 'come suddividere e paralleleggiare', mentre Superpowers si occupa 'di come scrivere e assicurare la qualità'. Sono complementari – utilizzare GSD per il piano generale e richiamare le competenze TDD di Superpowers nelle attività chiave è attualmente la migliore pratica. Contenuto: Ho costruito un processo di sviluppo più stabile con GSD + Superpowers. Il mio approcc | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Personal Reflections on Systematized AI Development Workflow. GSD handles 'how to break down + how to parallelize', while Superpowers handles 'how to write + how to ensure quality'. The two are complementary — using GSD for overall planning and leveraging Superpowers’ TDD skills in key tasks is currently the best practice. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Hypothesis is incomplete/invalid as text; no actual translation of source title or content is provided. | Severe omission. The hypothesis contains only ellipsis and a brace, failing to translate any part of the source text (Title or Content). | Critical: The hypothesis is essentially empty/malformed - just contains '...}' which is not a translation at all. Missing the entire content. | Hypothesis is incomplete/malformed. Contains only '...}' instead of a proper translation in Chinese. |
| Qwen/Qwen3.5-4B | Title: Personal reflections on systematic development processes for AI. GSD handles "how to decompose and how to parallelize", while Superpowers manages "how to write and how to ensure quality". These two are complementary—using GSD for overall planning and invoking Superpowers' TDD skills in critical tasks is currently best practice. Content: I have built a more stable development process using GSD + Superpowers. My current stable approach is: GSD handles "how to decompose and how to parallelize"... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | "en": "Title: Personal reflections on a systematic development process for AI. GSD manages \"how to break down + how to parallelize\", while Superpowers handles \"how to write + how to ensure quality\". The two are actually complementary—using GSD for overall planning and invoking Superpowers' TDD skills in key tasks is currently best practice.", "zh-TW": "標題:關於 AI 系統化開發流程的個人感悟。GSD 負責「怎麼拆解 + 怎麼並行」,Superpowers 負責「怎麼寫 + 怎麼保證質量」。兩者其實是互補關係——用 GSD 進行整體規劃,在關鍵任務中調用 Superpowers 的 TDD 技能,是目前的最佳實踐。\n內容:我使用 GSD + Superpowers 搭建了一套更穩健的開發流程。我目前比較穩定的一套做法是:GSD 負責「怎麼拆解 + 怎麼並行」...", "ja": "タイトル:AI の体系的な開発プロセスに関する個人的な感想。GSD は「どのように分解するか+どのように並列化するか」を管理し、Superpowers は「どのように書くか+どのように品質を保証するか」を担当します。両者は実際には補完関係にあります——全体計画に GSD を使い、重要なタスクで Superpowers の TDD スキルを呼び出すのが現在のベストプラクティスです。\nコンテンツ:私は GSD + Superpowers を使ってより安定した開発プロセスを構築しました。現在私が比較的安定しているやり方は:GSD が「どのように分解するか+どのように並列化するか」を管理し...", "ko": "제목: AI 의 체계적인 개발 프로세스에 대한 개인적 감상. GSD 는 \"어떻게 분해하고 병렬화하는가\"를 관리하며, Superpowers 는 \"어떻게 작성하고 품질을 보장하는가\"를 담당합니다. 두 가지는 실제로 상호 보완적입니다 — 전체 계획에는 GSD 를 사용하고, 중요한 작업에서 Superpowers 의 TDD 기술을 호출하는 것이 현재 최고의 관행입니다.\n콘텐츠: 저는 GSD + Superpowers 를 사용하여 더 안정적인 개발 프로세스를 구축했습니다. 제가 현재 비교적 안정적으로 사용하는 방법은: GSD 가 \"어떻게 분해하고 병렬화하는가\"를 관리하며...", "de": "Titel: Persönliche Gedanken zu einem systematischen Entwicklungsprozess für KI. GSD verwaltet „wie man aufteilt + wie man parallelisiert", während Superpowers sich mit „wie man schreibt + wie man Qualität sicherstellt" befasst. Beide sind tatsächlich komplementär – die Verwendung von GSD für die Gesamtplanung und das Aufrufen der TDD-Fähigkeiten von Superpowers bei kritischen Aufgaben ist derzeit die beste Praxis.", "fr": "Titre : Réflexions personnelles sur un processus de développement systématique pour l'IA. GSD gère « comment décomposer + comment paralléliser », tandis que Superpowers s'occupe de « comment écrire + comment garantir la qualité ». Les deux sont en réalité complémentaires : utiliser GSD pour la planification globale et invoquer les compétences TDD de Superpowers dans les tâches clés constitue actuellement la meilleure pratique.", "it": "Titolo: Riflessioni personali su un processo di sviluppo sistematico per l'IA. GSD gestisce \"come scomporre + come parallelizzare\", mentre Superpowers si occupa di \"come scrivere + come garantire la qualità\". I due sono in realtà complementari: utilizzare GSD per la pianificazione generale e richiamare le competenze TDD di Superpowers nei compiti chiave è attualmente la migliore pratica." } | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | About AI Systemized Development Process Personal Insights. GSD handles "how to break down + how to parallelize", while Superpowers handles "how to write + how to ensure quality". They are complementary relationships - using GSD for overall planning and invoking Superpowers' TDD skills in critical tasks is the best current practice. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Personal Reflections on Systematic AI Development Process. GSD manages "how to break down + how to run in parallel", and Superpowers manages "how to write + how to ensure quality". The two are actually complementary - using GSD for overall planning and calling Superpowers' TDD skills in key tasks is our current best practice Content: I have built a more stable development process with GSD + Superpowers. My relatively stable approach now is: GSD manages "how to break down + how to run in parallel"... | 通过 (无共识错误) |
| google/gemma-3-1b-it | Regarding AI system development process and personal reflections, the Superpowers system is how to write and ensure quality. |
严重
[准确性]
"Regarding AI system development process and personal reflections, the Superpowers system is how to write and ensure quality."
理由: Only partially reflects the source; omits GSD’s role, the complementary relationship, and the best-practice context, thereby losing key meaning. | The hypothesis omits the explanation of what 'GSD' handles ('how to break down + how to parallelize'). It also fails to mention the complementary relationship and the best practice of using both together. | Hypothesis only partially translates the title and misses the key information about GSD, the complementary relationship, and best practices mentioned in the reference. |
| google/gemma-3-4b-it | Title: Personal Reflections on a Systematic Development Process for AI. GSD (Guide System Design) manages "how to decompose + how to parallelize", and Superpowers manage "how to write + how to ensure quality". The two are complementary – using GSD for overall planning and calling Superpowers’ TDD skills in key tasks is currently best practice. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Systematic Development Process for AI - Personal Reflections. GSD manages "how to break down tasks and how to run them in parallel". Superpowers manages "how to write code and how to ensure quality". The two are actually complementary - using GSD for overall planning and calling on Superpowers' TDD skills within key tasks is the current best practice. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Personal Reflections on Systemic AI Development Process. GSD (How to break down + How to parallelize), Superpowers (How to write + Quality Assurance). These two are complementary – using GSD for overall planning and calling upon Superpowers' TDD skills is currently the best practice. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: The hypothesis completely ignores the source text about AI development processes and outputs a meta-string about translating 'Login'. | Complete mistranslation - source discusses GSD and Superpowers development methodology, hypothesis translates '登录' (login) which is completely unrelated | Hypothesis is completely unrelated to source. Source discusses AI development processes with GSD and Superpowers, not login (登录). | Completely unrelated placeholder; does not translate or reflect any of the development process content (GSD, Superpowers, TDD). |
| tencent/HY-MT1.5-7B | "en": "I have developed a more stable development process using GSD + Superpowers. My current stable approach is: GSD handles ‘how to break down tasks + how to parallelize them’...", "zh-TW": "我已使用 GSD + Superpowers 开发出了一套更为稳定的开发流程。我目前采用的稳定方法是:GSD负责‘如何分解任务 + 如何并行处理它们’..."..., "ja": "私は GSD + Superpowersを使ってより安定した開発プロセスを構築しました。現在採用している安定した方法は、GSDが‘どのようにタスクを分割するか + どのように並行処理するか’を担当しています..."..., "ko": "저는 GSD + Superpowers를 사용하여 더욱 안정적인 개발 프로세스를 만들었습니다. 현재 사용하고 있는 안정적인 방법은, GSD가 ‘어떻게 작업을 분할하는지 + 어떻게 병렬 처리하는지’를 담당하고 있습니다..."..., "de": "Ich habe mit GSD + Superpowers einen stabileren Entwicklungprozess entwickelt. Meine derzeitige stabile Methode ist: GSD kümmert sich um ‘wie man Aufgaben unterteilt + wie man sie parallelisiert’...", "fr": "J’ai développé un processus de développement plus stable en utilisant GSD + Superpowers. Ma méthode actuelle et stable est : GSD s’occupe de ‘comment diviser les tâches + comment les paralléliser’...", "it": "Ho sviluppato un processo di sviluppo più stabile utilizzando GSD + Superpowers. La mia metodologia attuale e stabile è: GSD si occupa di ‘come suddividere le attività + come eseguirle in parallelo’..."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: I am trying to understand why my VPN is not working properly. |
致命
[准确性]
"I am trying to understand why my VPN is not working properly."
理由: Completely inaccurate translation. The source asks about proxy quality affecting access speed to a specific site, not about VPN not working. Missing the title entirely. | Hallucination/Complete rewrite. The hypothesis invents a generic statement instead of translating the specific source text about proxy quality and scripts. Omits Title. | Incorrect translation - '机场' means proxy provider/VPN service, but translation is too vague and doesn't capture the meaning of the original question about proxy quality affecting LINUX DO access | Does not translate any of the specific content about proxy provider quality, access rate limits, or scripts; it is a generic paraphrase and misses core meaning. |
| CohereLabs/tiny-aya-water | Content: Are there any known issues with your VPN server? If so, please provide details. |
致命
[准确性]
"Are there any known issues with your VPN server? If so, please provide details."
理由: Completely unrelated to source content. Source asks about proxy quality affecting access rates to LINUX DO, not asking for VPN server issues. | Completely mistranslated - source asks about proxy provider quality affecting LINUX DO access, not VPN server issues | Completely ignores source question about whether poor proxy/airport quality limits access rate and instead invents a generic support question; meaning is not preserved. | Hallucination. The content is completely unrelated to the source text. Missing Title translation. |
| Qwen/Qwen2.5-14B-Instruct | Title: Poor proxy provider quality affecting L station access speed? Content: Is it because my proxy provider is not good? Today I often encounter this issue: I don't think I'm running any scripts (can't run these either? They seem like normal functions) | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Poor quality proxy will affect L station access speed? Content: Is my proxy quality poor? I often encounter this issue today: I haven't run any scripts (are these also not allowed, they all seem to be normal functions) |
严重
[准确性]
"L station"
理由: Should be 'Will a poor quality proxy provider affect or limit the access rate to LINUX DO?' - 'L station' is unclear, should specify LINUX DO | “L站”在语境和参考中是 LINUX DO,直译为“L station”未传达具体站点名称。 | Mistranslation of 'L 站', which is a specific community nickname for 'LINUX DO'. Translating it literally as 'L station' loses the specific referent and meaning intended in the source. |
| Qwen/Qwen3-14B | Una cattiva qualità del provider influisce sulla velocità di accesso al sito L? |
致命
[准确性]
"sito L"
理由: ‘L站’ here refers to a specific site (LINUX DO); rendered only as ‘sito L’, losing the intended reference. | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | Severe omission. The hypothesis only translates the title, completely missing the content where the user describes the specific situation and questions about scripts. |
| Qwen/Qwen3.5-4B | Title: Poor airport quality may limit access speed on L node? Content: Is my proxy provider's quality poor? I often encounter this situation today. I probably haven't activated any scripts (can't these be used either? They feel like normal features). |
严重
[准确性]
"L node"
理由: Mistranslation of 'L 站' (L Station), which is a specific community nickname for 'LINUX DO', as 'L node'. | Incorrect translation; '机场' is slang for proxy provider, 'L站' refers to LINUX DO; should be 'Will a poor quality proxy provider affect or limit the access rate to LINUX DO' | Incorrect: 'L node' should be 'LINUX DO' as per reference | Misinterprets “L站” (Linux DO in reference) as “L node”, changing the referent site/service. |
| Qwen/Qwen3.5-9B | Title: Poor proxy quality affects L-site access speed? Content: Is my proxy provider's quality poor? I often encounter this situation today: I shouldn't have any scripts running (can't these be run either, they seem like normal features) |
严重
[准确性]
"L-site"
理由: Failed to expand 'L 站' to its specific meaning 'LINUX DO' (a specific forum), rendering it vague. | Does not capture that “L站” refers to a specific site/service (e.g. LINUX DO); vague initialism loses referent. | Should be 'LINUX DO' - 'L站' refers to a specific platform, not just 'L-site'. Reference clarifies this | 'L-site' should be 'LINUX DO' per reference |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Is your proxy quality poor affecting your access speed on L站? |
严重
[准确性]
"Is your proxy quality poor affecting your access speed on L站?"
理由: The hypothesis translates only the title and omits the entire content section of the source text, resulting in significant missing information. | Incorrect: 'L站' likely refers to 'LINUX DO' (a site), not literally 'L站'. The translation 'L站' is kept but unexplained | Incorrect subject; should refer to the user's own proxy quality, not 'your'. Also omits notion of 'rate limiting' implied by '受限' in title. | Should be 'Will a poor quality proxy provider affect or limit the access rate to LINUX DO?' - more accurate interpretation of source |
| google/gemma-3-12b-it | Will poor proxy quality affect L station access speed? |
严重
[准确性]
"Will poor proxy quality affect L station access speed?"
理由: Major mistranslation: 'L station' is unclear, reference specifies 'LINUX DO'; also missing Content portion entirely | Title omits 'limit' nuance and does not disambiguate 'L站' (here likely LINUX DO). It should reflect both impact and rate limiting as in the reference. | Incomplete translation. Missing content section entirely. Should include full title and content as in reference. | Incomplete translation; omitted the entire 'Content' section of the source text. |
| google/gemma-3-1b-it | Is my airport quality bad and limiting access rate for Station L? | 通过 (无共识错误) |
| google/gemma-3-4b-it | Is my proxy quality not good, and is access rate limited to L station? |
严重
[准确性]
"L station"
理由: The hypothesis omits the second sentence of the content entirely. It also mistranslates 'L 站' (L Station/LinuxDo) as 'L station' without context and misinterprets the grammar of the first sentence. | Incorrect translation - 'L站' refers to 'LINUX DO' (a platform), not 'L station' | The phrase 'access rate limited to L station' is missing and should be included to accurately represent the content. | “L站” in this context refers to LINUX DO; leaving as “L station” fails to convey the actual site/service and contradicts the reference. | Should be 'LINUX DO' not 'L station'. Incomplete translation of the acronym. |
| google/translategemma-12b-it | Will poor proxy quality affect L station access speed and limit access? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Poor proxy quality can affect L station access rate and cause restrictions? Content: Is my proxy quality poor, and I often encounter this situation: I don't seem to have installed any scripts (can't install these either, feels like they're normal functions) |
严重
[准确性]
"L station"
理由: The reference clarifies it refers to ‘LINUX DO’; “L站” is a specific site/service, not literally ‘L station’. Rendering as ‘L station’ fails to disambiguate the intended service. | Should be 'LINUX DO' per reference, not 'L station'. | Mistranslation of the slang term 'L 站' (referring to LINUX DO). The reference correctly identifies the entity, while the hypothesis translates it literally as 'L station', losing the specific meaning. | Unclear translation. Reference shows 'LINUX DO' which is a specific platform name. 'L station' is too vague. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '机场质量不行会影响L站访问速率受限?' in en", "zh-TW": "Translation of '机场質量不行會影響L站訪問速率受限?' in zh-TW", "ja": "Translation of '机场質量不行がL駅のアクセス速度制限に影響する?' in ja", "ko": "Translation of '공항 품질이 좋지 않으면 L역의 접속 속도 제한에 영향을 미칠까?' in ko", "de": "Translation of 'Mangelnder Qualität des Flughafens beeinträchtigt die Limits der Zugriffsgeschwindigkeit an Station L?' in de", "fr": "Translation of 'La qualité du site d'aéroport insuffisante affecte la vitesse limitée de l'accès à la station L?' in fr", "it": "Translation of 'La qualità dell' aeroporto non buona influisce sulla velocità massima di accesso alla stazione L?' in it} |
致命
[准确性]
""en": "Translation of '机场质量不行会影响L站访问速率受限?' in en"..."
理由: The hypothesis is a malformed JSON snippet listing translation tasks for multiple languages instead of providing the actual translation. | Hypothesis is metadata/template text rather than actual translation. Should provide actual translated content. | The translation is not provided and only includes a placeholder. |
| tencent/HY-MT1.5-7B | "en": "Title: Does Poor Airport Quality Affect the Limited Access Speed of L Station?", "zh-TW": "标题:机场质量差会影响到L站的访问速度受限吗?", "ja": "タイトル:空港の品質が悪いとLステーションのアクセス速度が制限されるのでしょうか?", "ko": "제목: 공항의 품질이 좋지 않으면 L스테이션의 접속 속도가 제한될까요?", "de":="Titel: Beeinträchtigt eine schlechte Flughafenqualität die eingeschränkte Zugriffsgeschwindigkeit der L-Station?", "fr":="Titre : La qualité médiocre de l’aéroport affecte-t-elle la vitesse d’accès limitée de la station L ?", "it":="Titolo: La qualità scadente dell’aeroporto influisce sulla velocità di accesso limitata della stazione L?"}|im_end|> |
严重
[准确性]
"Airport"
理由: Incorrect: '机场' (proxy provider) was translated as 'Airport' which is wrong; reference uses 'proxy provider'. Also 'L Station' should be 'LINUX DO' (the platform name) | Hypothesis contains multiple language versions. Should be 'Will a poor quality proxy provider affect or limit the access rate to LINUX DO?' | Misinterprets slang '机场' which here means network/proxy provider, not a physical airport. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Tutorial】0 yuan exclusive "Little Crabs" OpenClaw, 2-core 16GB configuration / painless deployment / ready to use out of the box |
严重
[术语]
"Little Crabs"
理由: Should be 'OpenClaw' not 'Little Crabs'. '小龙虾' is slang for OpenClaw. | Mistranslation of '小龙虾' (OpenClaw). Should remain OpenClaw or be translated as crayfish/lobster in context, not 'Little Crabs'. | Incorrect translation of '小龙虾' as 'Little Crabs' instead of 'OpenClaw'. |
| CohereLabs/tiny-aya-water | Title: 【Tutorial】0 yuan exclusive “Little Crab” OpenClaw, 2Cores16GB configuration / painless deployment / open source |
严重
[准确性]
"open source"
理由: Should be 'out-of-the-box ready' not 'open source'. '开箱即用' means ready to use immediately, not open source. | Mistranslates "开箱即用" (out-of-the-box ready) as "open source", changing the meaning. | Reference says 'out-of-the-box ready' not 'open source' | Mistranslation of '开箱即用' (out-of-the-box ready) as 'open source'. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Tutorial] Adopt Exclusive 'Little Lobster' OpenClaw for Free, Configured with 2 Cores and 16GB RAM / Seamless Deployment / Ready to Use Content: A Linux container pre-installed with OpenClaw and equipped with a desktop environment, compatible with free deployment services like ModelScope and HuggingFace, allowing smooth experience through browser |
严重
[术语]
"Exclusive 'Little Lobster' OpenClaw"
理由: Should be 'OpenClaw' not 'Little Lobster'. While 'lobster' is slang, the proper product name is 'OpenClaw'. | Incorrect translation of '小龙虾' as 'Little Lobster' instead of 'OpenClaw'. | Mixes mascot nickname 'Little Lobster' with brand; understandable but slightly informal compared to using just OpenClaw. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Little Lobster' |
| Qwen/Qwen2.5-7B-Instruct | Title: [Tutorial] Get Your Exclusive ‘Shrimp’ OpenClaw for Free, 2-Core 16GB Config/No Pain Deployment/Plug and Play Content: A Linux container pre-installed with OpenClaw and desktop environment, compatible with free container deployment services like ModelScope and HuggingFace, experience OpenClaw directly through your browser. |
严重
[术语]
"‘Shrimp’"
理由: Should be 'OpenClaw' not 'Shrimp' - incorrect terminology | “小龙虾”在本论坛语境是 OpenClaw 的代称,不应直译为 “Shrimp”;应保留为 OpenClaw 或 Lobster/OpenClaw。 | Incorrect translation of '小龙虾' as 'Shrimp' instead of 'OpenClaw'. |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Severe omission. The hypothesis contains only ellipsis and generation artifacts, failing to translate the Title or Content. | Critical: The hypothesis is essentially empty/malformed - just contains '...}' which is not a translation at all. Missing the entire content. | Hypothesis is malformed and empty; no translation of title or content is provided. | Hypothesis is incomplete/malformed. Contains only '...}' instead of a proper translation in Chinese. |
| Qwen/Qwen3.5-4B | "en": "Title: [Tutorial] Adopt OpenClaw 'Crayfish' for free - 2 cores / 16GB RAM / painless deployment / ready to use out of the box\nContent: A Linux container pre-installed with OpenClaw and a desktop environment, compatible with ModelScope, HuggingFace, and other free container deployment services. Experience OpenClaw directly through your browser.", "zh-TW": "標題:【教學】免費領養專屬「小龍蝦」OpenClaw,2 核 16GB 配置/無痛部署/開箱即用\n內容:一個預裝 OpenClaw 並具備桌面環境的 Linux 容器,支援 ModelScope、HuggingFace 等免費容器部署服務,透過瀏覽器即可暢玩體驗 OpenClaw", "ja": "タイトル:[チュートリアル] オープンクラウの「カニ」を無料で採用 - 2 コア / 16GB RAM / 無難なデプロイ / 箱からすぐに使える\n内容:オープンクラウとデスクトップ環境がプリインストールされた Linux コンテナ。ModelScope、HuggingFace およびその他の無料コンテナデプロイメントサービスに対応しています。ブラウザを通じてオープンクラウを楽しめます。", "ko": "제목: [튜토리얼] 무료로 전용 \"작은 새우\" OpenClaw 입양 - 2 코어 / 16GB RAM / 통증 없는 배포 / 박스 바로 사용\n내용: 오픈클로우와 데스크톱 환경이 사전 설치된 Linux 컨테이너입니다. ModelScope, HuggingFace 등 무료 컨테이너 배포 서비스에 호환되며 브라우저에서 바로 OpenClaw 를 즐길 수 있습니다.", "de": "Titel: [Tutorial] OpenClaw \"Krabben\" kostenlos adoptieren - 2 Kerne / 16 GB RAM / schmerzloser Deployment / sofort einsatzbereit\nInhalt: Ein Linux-Container mit vorinstalliertem OpenClaw und einem Desktop-Umfeld, kompatibel mit ModelScope, HuggingFace und anderen kostenlosen Container-Bereitstellungsdiensten. Erleben Sie OpenClaw direkt über Ihren Browser.", "fr": "Titre : [Tutoriel] Adoptez gratuitement le « Crayfish » OpenClaw exclusif - 2 cœurs / 16 Go RAM / déploiement sans douleur / prêt à l'emploi hors boîte\Contenu : Un conteneur Linux pré-installé avec OpenClaw et un environnement de bureau, compatible avec ModelScope, HuggingFace et d'autres services de déploiement de conteneurs gratuits. Profitez d'OpenClaw directement via votre navigateur.", "it": "Titolo: [Tutorial] Adotta gratuitamente il \"Gambero\" OpenClaw esclusivo - 2 nuclei / 16GB RAM / deploy senza dolore / pronto all'uso fuori dalla scatola\nContenuto: Un contenitore Linux preinstallato con OpenClaw e un ambiente desktop, compatibile con ModelScope, HuggingFace e altri servizi di distribuzione dei contenitori gratuiti. Goditi OpenClaw direttamente dal tuo browser." } |
严重
[术语]
"Crayfish"
理由: Should be 'OpenClaw' instead of 'Crayfish'. | Should be 'OpenClaw' not 'Crayfish' as per reference | In the English section of the output, '小龙虾' is translated as 'Crayfish' instead of the established term 'OpenClaw'. |
| Qwen/Qwen3.5-9B | Title: [Tutorial] Adopt Exclusive "Crayfish" OpenClaw for Free (0 Yuan), 2 Cores 16GB Configuration / Seamless Deployment / Ready to Use Out of the Box Content: A Linux container pre-installed with OpenClaw and featuring a desktop environment, compatible with free container deployment services like ModelScope and HuggingFace. Experience OpenClaw directly through your browser. |
轻微
[术语]
""Crayfish" OpenClaw"
理由: Should be translated as 'OpenClaw' instead of 'Crayfish' to maintain consistency with the reference. | Should be 'OpenClaw'. 'Crayfish' is the literal meaning of the slang nickname. | Mixes literal “Crayfish” with product name OpenClaw; slang reference is acceptable but less precise than just OpenClaw as brand. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | A guide to obtaining an exclusive 'small shrimp' OpenClaw with a 2-core 16GB configuration for free, painless deployment, and ready-to-use out-of-the-box. |
严重
[术语]
"A guide to obtaining an exclusive 'small shrimp' OpenClaw"
理由: Incorrectly translates the slang '小龙虾' as 'small shrimp' alongside 'OpenClaw', creating redundancy and confusion. It should just be 'OpenClaw'. | '小龙虾' is OpenClaw's nickname; duplicating as "'small shrimp' OpenClaw" is misleading. Better to use 'OpenClaw' alone. | Incorrect: '小龙虾' is slang for 'OpenClaw', should be 'OpenClaw' not 'small shrimp' | Should be 'OpenClaw' not 'small shrimp'. '小龙虾' is slang terminology for OpenClaw. |
| google/gemma-3-12b-it | Title: [Tutorial] Adopt Your Exclusive "Little Lobster" OpenClaw for Free - 2 Cores, 16 GB RAM / Painless Deployment / Out-of-the-Box Content: A Linux container pre-installed with OpenClaw and a desktop environment, compatible with free container deployment services like ModelScope and Hugging Face, allowing you to enjoy OpenClaw through your browser. |
轻微
[术语]
"Little Lobster"
理由: Should be 'OpenClaw' instead of 'Little Lobster' | Unnecessary literal translation of the nickname; 'OpenClaw' is sufficient and clearer. | Should be 'OpenClaw' not 'Little Lobster' - inconsistent terminology | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Little Lobster' |
| google/gemma-3-1b-it | Tutorial: [Title] - A Linux container with OpenClaw pre-installed, featuring 0 yuan adoption of exclusive "Little Shrimp" (ModelScope, Hugging Face), and seamless deployment/ready to use without pain. |
严重
[准确性]
"Tutorial: [Title] - A Linux container with OpenClaw pre-installed, featuring 0 yuan adoption of exclusive "Little Shrimp" (ModelScope, Hugging Face), and seamless deployment/ready to use without pain."
理由: The hypothesis incorrectly structures the title and content. It includes '[Title]' as literal text. It misinterprets '0 yuan adoption' as a feature of the container alongside ModelScope/HuggingFace in a confusing way. It omits the '2 cores 16GB config' and 'desktop environment' details. | Title not properly translated; mixes elements (0 yuan adoption, seamless deployment) that belong in title but are merged into content and injects ModelScope/Hugging Face as part of the adoption phrase. Omits 'desktop environment' and that ModelScope/HF are deployment services, and loses the browser-based usage detail. | Title incomplete - missing '0元领养专属' (adopt for free) and '2核16GB配置/无痛部署/开箱即用' details |
| google/gemma-3-4b-it | Title: [Tutorial] Free Adoption Exclusive ‘Little Lobster’ OpenClaw, 2 Core 16GB Configuration / Seamless Deployment / Ready to Use |
轻微
[术语]
"Free Adoption Exclusive ‘Little Lobster’"
理由: “小龙虾” is slang for OpenClaw; here it might be clearer to retain the product name rather than an animal nickname, though meaning is partially implied by including OpenClaw. | The term 'Little Lobster' is too informal and does not accurately represent the specific product 'OpenClaw'. | Should be 'OpenClaw' not 'Little Lobster'. While '小龙虾' means little lobster, in this context it refers to OpenClaw. | Incorrect translation - 'Little Lobster' should be 'OpenClaw' (小龙虾 is a nickname for OpenClaw) |
| google/translategemma-12b-it | Title: [Tutorial] Adopt your own "Little Lobster" OpenClaw for free, with 2 cores and 16GB configuration / Easy deployment / Ready to use out of the box Content: A Linux container pre-installed with OpenClaw and a desktop environment, compatible with free container deployment services such as ModelScope and HuggingFace, allowing you to enjoy and experience OpenClaw directly through your browser. | 通过 (无共识错误) |
| google/translategemma-4b-it | "en": "Title: 【Tutorial】Get your own \"Little Lobster\" for free via OpenClaw, 2-core 16GB configuration / Seamless deployment / Ready to use", "zh-TW": "標題:【教學】免費領取專屬「小龍蝦」OpenClaw,2核心16GB配置/無縫部署/即用", "ja": "タイトル:【チュートリアル】OpenClawで無料の「小さなエビ」を手に入れる、2コア16GB構成/シームレスな展開/すぐに使える", "ko": "제목: [튜토리얼] OpenClaw를 통해 무료로 '작은 게' 획득, 2코어 16GB 구성/무난한 배포/즉시 사용", "de": "Titel: \[Tutorial] Kostenloses Erhalten eines eigenen \"Kleinen Hummer\" über OpenClaw, 2-Kern-16GB-Konfiguration / Nahtlose Bereitstellung / Sofort einsatzbereit", "fr": "Titre : \[Tutoriel] Obtenir un propre \"Petit Homard\" gratuitement via OpenClaw, configuration 2 cœurs 16 Go / Déploiement sans problème / Prêt à l'emploi", "it": "Titolo: \[Tutorial] Ottieni il tuo proprio \"Piccolo Gambero\" gratuitamente tramite OpenClaw, configurazione 2 core 16 GB / Distribuzione senza problemi / Pronto all'uso" } <|im_end|> |
严重
[术语]
""Little Lobster""
理由: The nickname “小龙虾” is the community nickname for OpenClaw itself; the reference keeps focus on adopting OpenClaw, not a generic ‘Little Lobster’. This can mislead readers away from the product name. | Should be 'OpenClaw' not 'Little Lobster' per reference. | Incorrect translation of '小龙虾' as 'Little Lobster' instead of 'OpenClaw'. | While 'Little Lobster' is a valid translation of '小龙虾', the reference uses 'OpenClaw' which is the actual product name. Should prioritize product name. |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: Hypothesis is completely unrelated to source. Source discusses OpenClaw deployment tutorial, not login (登录). | The hypothesis fails to translate the source text about OpenClaw tutorials, outputting an unrelated meta-string about 'Login'. | Placeholder not related to the tutorial about adopting OpenClaw; no source meaning is conveyed. | Complete mistranslation - source is about OpenClaw tutorial for deployment, hypothesis translates '登录' (login) which is completely unrelated |
| tencent/HY-MT1.5-7B | "en": "Title: [Tutorial] Get Exclusive ‘Little Lobster’ OpenClaw for Free at 0 Yuan, 2-Core 16GB Configuration / Painless Deployment / Ready to Use Out of the Box", "zh-TW": "Title: [教程] 免費獲得專屬的“小龍蝦”OpenClaw,2核16GB配置/無痛部署/開箱即用", "ja": "Title: [チュートリアル] 0円で専用の「小龍蝦」OpenClawを無料取得できます。2コア16GB構成/簡単なデプロイメント/開箱すぐに使用可能", "ko": "Title: [튜토리얼] 0원으로 전용 “소형 랍스터” OpenClaw를 무료로 얻을 수 있습니다. 2코어 16GB 구성/편리한 배포/개봉 즉시 사용 가능", "de":="Titel: [Tutorial] Erhalten Sie den exklusiven „Kleine Hummer“ OpenClaw kostenlos bei 0 Euro, 2-Kern-16GB-Ausstattung / Schmerzlose Bereitstellung / Sofort einsatzbereit nach Auspacken", "fr":="Titre: [Tutoriel] Obtenez gratuitement l’OpenClaw „Petit Homme de Mer“ exclusif, configuration 2 cœurs 16 GB / Déploiement sans effort / Prêt à l’emploi dès l’ouverture", "it":="Titolo: [Tutoriale] Ottenete gratuitamente l’OpenClaw “Piccolo Polpo” esclusivo, configurazione 2 core 16 GB / Distribuzione senza problemi / Pronto all’uso appena aperto"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: Repository link: GitHub - iBUHub/AIStudioToAPI: A wrapper that exposes Google AI Studio as OpenAI, Gemini, and Anthropic compatible APIs.(A tool that wraps Google AI Studio to expose it as OpenAI / Gemini / Anthropic style APIs) | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: build Reverse Proxy (AIStudioToAPI) release v1 version, significantly shorten account switching time |
轻微
[流畅性]
"build Reverse Proxy"
理由: Unnatural English noun phrase; should be "Build reverse proxy" or "Reverse proxy build"; article/capitalization off. | Awkward phrasing. Should be 'Build version reverse proxy' or 'Build reverse proxy' for better fluency. | Awkward phrasing. 'build' seems to be treated as a proper noun or verb incorrectly. | Incorrect translation of 'build 反代' as 'build Reverse Proxy' instead of 'Build version reverse proxy'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Build Reverse Proxy (AIStudioToAPI) Release v1, Significantly Reduces Account Switching Time Content: Repository Link: GitHub - iBUHub/AIStudioToAPI: A wrapper that exposes Google AI Studio as OpenAI, Gemini, and Anthropic compatible APIs. (A tool that wraps Google AI Studio to be compatible with OpenAI / Gemini / Anthropic style APIs) | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: build reverse proxy (AIStudioToAPI) release v1 version, significantly reduce account switching time Content: Repository link: GitHub - iBUHub/AIStudioToAPI: A wrapper that exposes Google AI Studio as OpenAI, Gemini, and Anthropic compatible APIs. (A tool to wrap Google AI Studio as APIs compatible with OpenAI / Gemini / Anthropic styles) |
轻微
[流畅性]
"build reverse proxy (AIStudioToAPI)"
理由: 标题中“build”应大写且与“反代”搭配更自然为“Build reverse proxy”;当前大小写和结构略不自然。 | Should be 'Build version reverse proxy (AIStudioToAPI) releases v1' - awkward phrasing | Awkward phrasing. 'Build' is likely part of the name or a verb, but the structure is clunky. |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Severe omission. The hypothesis contains only ellipsis and generation artifacts, failing to translate the Title or Content. | Critical: The hypothesis is essentially empty/malformed - just contains '...}' which is not a translation at all. Missing the entire content. | Hypothesis is malformed and empty; no translation of title or content is provided. | Hypothesis is incomplete/malformed. Contains only '...}' instead of a proper translation in Chinese. |
| Qwen/Qwen3.5-4B | Title: Build reverse proxy (AIStudioToAPI) release v1 version, significantly reducing account switching time. Content: Repository link: GitHub - iBUHub/AIStudioToAPI: A wrapper that exposes Google AI Studio as OpenAI, Gemini, and Anthropic-compatible APIs. (A tool that wraps Google AI Studio to expose APIs compatible with OpenAI/Gemini/Anthropic styles) | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Build reverse proxy (AIStudioToAPI) release v1 version, significantly shorten account switching time Content: Repository link: GitHub - iBUHub/AIStudioToAPI: A wrapper that exposes Google AI Studio as OpenAI, Gemini, and Anthropic compatible APIs. (A tool that encapsulates Google AI Studio as APIs compatible with OpenAI/Gemini/Anthropic styles) |
轻微
[流畅性]
"Build reverse proxy (AIStudioToAPI) release v1 version"
理由: Awkward English word order; “Build version reverse proxy … releases v1” or similar would be more natural. | Awkward phrasing. 'Build' is likely part of the name or a verb, but 'Build version reverse proxy' (reference) or 'Released v1 of the reverse proxy' flows better. | Should be 'Build version reverse proxy (AIStudioToAPI) releases v1' to match reference |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | build Reverse Proxy (AIStudioToAPI) release v1, significantly reducing account switching time | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Release v1 of build Reverse Proxy (AIStudioToAPI), significantly reducing account switching time\nContent: Repository link: GitHub - iBUHub/AIStudioToAPI: A wrapper that exposes Google AI Studio as OpenAI, Gemini, and Anthropic compatible APIs. (A tool that wraps Google AI Studio to be an API compatible with OpenAI / Gemini / Anthropic styles) | 通过 (无共识错误) |
| google/gemma-3-1b-it | Link to the repository: GitHub - iBUHub/AIStudioToAPI: A wrapper that exposes Google AI Studio as OpenAI, Gemini, and Anthropic compatible APIs. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: build Reverse Proxy (AIStudioToAPI) release v1 version, significantly reduce account switching time |
轻微
[流畅性]
"build Reverse Proxy"
理由: “build 反代” is better rendered as “build reverse proxy” or “build version reverse proxy”; current capitalization/structure is slightly awkward but understandable. | The capitalization and phrasing are slightly off. 'Build version' or 'Released v1 of the reverse proxy' flows better. | Reference uses 'Build version reverse proxy' which is slightly clearer. |
| google/translategemma-12b-it | Title: Release v1 of build Reverse Proxy (AIStudioToAPI), significantly reducing account switching time. Content: Repository link: GitHub - iBUHub/AIStudioToAPI: A wrapper that exposes Google AI Studio as OpenAI, Gemini, and Anthropic compatible APIs. (A tool that wraps Google AI Studio to be an API compatible with OpenAI / Gemini / Anthropic styles.) | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: release v1 version of reverse proxy (AIStudioToAPI), significantly reduce account switching time | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | 標題:建立逆向代理(AIStudioToAPI)已發布v1版,大幅縮短帳號切換時間。 內容:儲存庫連結:GitHub - iBUHub/AIStudioToAPI:一個工具,將Google AI Studio包裝起來,以提供與OpenAI、Gemini和Anthropic相容的API。 | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: Build Reverse Proxy (AIStudioToAPI) Released Version v1 – Significantly Reduces Account Switching Time", "zh-TW": "Title: 构建反向代理(AIStudioToAPI)版本v1发布——显著减少账户切换时间", "ja": "标题:构建反向代理(AIStudioToAPI)版本v1发布——显著缩短账户切换时间", "ko": "제목: 구축된 반대파 프록시(AIStudioToAPI)버전v1 출시 — 계정 전환 시간 대폭 단축", "de":="Titel: Veröffentlichung der Version v1 des Reverse-Proxies (AIStudioToAPI) – Erhebliche Reduzierung der Zeit für das Wechseln von Konten", "fr": "Titre : Lancement de la version v1 du reverse proxy (AIStudioToAPI) – Réduction significative du temps nécessaire pour changer de compte", "it": "Titolo: Lancio della versione v1 del reverse proxy (AIStudioToAPI) – Riduzione significativa del tempo necessario per cambiare account"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Small Crabs】Enterprise WeChat officially supports access |
严重
[术语]
"Small Crabs"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Small Crabs' | Mistranslation of '小龙虾' (OpenClaw). | Incorrect translation of '小龙虾' as 'Small Crabs' instead of 'OpenClaw'. | Should be 'OpenClaw' not 'Small Crabs'. '小龙虾' is slang for OpenClaw in this context. |
| CohereLabs/tiny-aya-water | Title: 【Shrimp Crab】Official Enterprise WeChat support access |
严重
[术语]
"【Shrimp Crab】"
理由: Source/ reference refer to "OpenClaw"/"小龙虾"; "Shrimp Crab" is incorrect and confuses the product name. | Incorrect translation - '小龙虾' refers to OpenClaw, not 'Shrimp Crab' | Incorrect translation of '小龙虾' as 'Shrimp Crab' instead of 'OpenClaw'. | Should be 'OpenClaw' not 'Shrimp Crab'. '小龙虾' is slang for OpenClaw. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Little Lobster] Enterprise WeChat Official Integration Available Content: Smooth integration |
严重
[术语]
"[Little Lobster]"
理由: Incorrect translation of '小龙虾' as 'Little Lobster' instead of 'OpenClaw'. | Title tag should use the product name OpenClaw rather than mascot translation to match reference and common usage. | Should be 'OpenClaw' not 'Little Lobster'. The proper product name is 'OpenClaw'. | Failed to translate the slang '小龙虾' to the specific product name 'OpenClaw'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Little Lobster' |
| Qwen/Qwen2.5-7B-Instruct | Title: [Shrimp Enterprise WeChat] Official Support Integrated Content: Integration is very smooth |
严重
[术语]
"[Shrimp Enterprise WeChat]"
理由: Should be '[OpenClaw] WeChat Work' not '[Shrimp Enterprise WeChat]'. 小龙虾 refers to OpenClaw, and 企业微信 is WeChat Work. | 原题为“【小龙虾】企业微信官方支持接入了”,应表达为“[OpenClaw] Enterprise WeChat ...”;“Shrimp Enterprise WeChat”造成歧义。 | Should be 'OpenClaw' not 'Shrimp'; 'Enterprise WeChat' should be 'WeChat Work' - incorrect terminology | Incorrect translation of '小龙虾' as 'Shrimp' instead of 'OpenClaw'. |
| Qwen/Qwen3-14B | 【Gambo】Supporto ufficiale WeChat Enterprise integrazione |
致命
[准确性]
"【Gambo】"
理由: Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. Also 'Gambo' is incorrect for '小龙虾'. | Completely wrong entity; should reflect 小龙虾 / OpenClaw, not ‘Gambo’. | Incorrect translation of '小龙虾' as 'Gambo' instead of 'OpenClaw' | Critical mistranslation: '小龙虾' is the nickname for OpenClaw, not 'Gambo' (which means 'leg/stem' in Italian). This completely mistranslates the entity name. |
| Qwen/Qwen3.5-4B | Title: [Crayfish] Enterprise WeChat official support has integrated Content: Integration is very smooth |
轻微
[流畅性]
"Enterprise WeChat official support has integrated"
理由: Awkward phrasing. 'WeChat Work now officially supports integration' (reference) is more natural. | Should be 'WeChat Work now officially supports integration' for better fluency | Reference uses 'WeChat Work now officially supports integration' - slightly different phrasing
严重
[术语]
"[Crayfish]"
理由: Should be 'OpenClaw' instead of 'Crayfish'. | Should be 'OpenClaw' not 'Crayfish' as per reference | Mistranslation of '小龙虾' as 'Crayfish' instead of 'OpenClaw'. |
| Qwen/Qwen3.5-9B | Title: [Crayfish] Enterprise WeChat official support has been integrated Content: The integration is very smooth |
严重
[术语]
"Crayfish"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Crayfish' | Should be translated as 'OpenClaw' instead of 'Crayfish' to maintain consistency with the reference. | Should be 'OpenClaw'. The title uses the slang '小龙虾' which refers to the specific project. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Login has become very smooth. |
严重
[准确性]
"Login has become very smooth."
理由: Incorrect: The title '[小龙虾]企业微信官方支持接入了' is about OpenClaw WeChat Work integration, not 'Login' | Changes meaning from 'integration is very smooth' to 'login has become very smooth', and omits mention of WeChat Work/enterprise WeChat support. | Should be 'The integration is very smooth' - hypothesis misinterprets the content | Complete hallucination/misinterpretation. The source talks about 'WeChat Work official support integration' (Title) and 'Integration is smooth' (Content). The hypothesis translates 'Login' which is not in the source, and misses the main topic of WeChat Work. |
| google/gemma-3-12b-it | Title: [Crayfish] Enterprise WeChat official support has been connected |
严重
[术语]
"[Crayfish]"
理由: Tag should preserve the product name 'OpenClaw'; translating it as 'Crayfish' drops the brand reference. | Should be 'OpenClaw' instead of 'Crayfish' | Should be 'OpenClaw' not 'Crayfish' - incorrect terminology | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Crayfish' | Incorrectly translated the project nickname '小龙虾' as 'Crayfish' instead of 'OpenClaw'. |
| google/gemma-3-1b-it | Integration is smooth as silk | 通过 (无共识错误) |
| google/gemma-3-4b-it | "en": "Title: [Small Lobster] Enterprise WeChat officially supports access.", "zh-TW": "標題:【小龍蝦】企業微信正式支援接入。", "ja": "タイトル:【小龍蝦】エンタープライズWeChatの公式連携をサポート。", "ko": "제목: 【소새우】 기업용 위챗 공식 지원 연동.", "de": "Titel: \[Kleinflunderschrimp] Die offizielle Enterprise WeChat-Integration wird unterstützt.", "fr": "Titre : \[Crevette Petite] L'accès à Enterprise WeChat est désormais pris en charge officiellement.", "it": "Titolo: \[Gamberetto Piccolo] Supporto ufficiale per l'accesso a Enterprise WeChat." } <|im_end|> |
严重
[术语]
"[Small Lobster]"
理由: The term 'Little Lobster' is too informal and does not accurately represent the specific product 'OpenClaw'. | “小龙虾” is the community nickname for OpenClaw; using literal “Small Lobster” in this app/integration context loses the well-known product reference. | Should be 'OpenClaw' not 'Small Lobster'. Reference correctly identifies this as OpenClaw. | Incorrect translation - 'Small Lobster' should be 'OpenClaw' (小龙虾 is a nickname for OpenClaw) |
| google/translategemma-12b-it | Title: [Crayfish] Enterprise WeChat official support has been integrated. |
严重
[术语]
"[Crayfish]"
理由: Title tag should be "[OpenClaw]"; using "Crayfish" loses the established product name and ties to the ecosystem. | Failed to translate the slang '小龙虾' as the specific product name 'OpenClaw'. 'Crayfish' is a literal biological translation that loses the technical context. | Incorrect translation - should be 'OpenClaw' not 'Crayfish' | Incorrect translation of '小龙虾' as 'Crayfish' instead of 'OpenClaw'. |
| google/translategemma-4b-it | Title: [Small Lobster] Enterprise WeChat officially supports access. |
严重
[术语]
"[Small Lobster]"
理由: Should be 'OpenClaw' not 'Small Lobster' per reference. | Incorrect translation of '小龙虾' as 'Small Lobster' instead of 'OpenClaw'. | “小龙虾” is the community name for OpenClaw; translating literally as ‘Small Lobster’ loses the product’s proper name, which the reference preserves as OpenClaw. |
| tencent/HY-MT1.5-1.8B | Translation of '小龙虾' in English |
致命
[准确性]
"Translation of '小龙虾' in English"
理由: Hypothesis is a placeholder template rather than actual translation; missing title and content translation | The hypothesis outputs a meta-instruction string instead of translating the source text about WeChat Work integration. | Hypothesis is metadata/template text rather than actual translation. Should provide actual translated title and content. | Meta phrase, not a translation; it omits the whole message about OpenClaw and WeChat Work integration. |
| tencent/HY-MT1.5-7B | Title: [Small Lobster] Official WeCom Support Has Been Added |
严重
[术语]
"[Small Lobster]"
理由: Should use product name 'OpenClaw' instead of literal 'Small Lobster' for clarity and consistency with reference. | Should be 'OpenClaw' not 'Small Lobster'. The source uses '小龙虾' as a nickname for OpenClaw | Failed to recognize '小龙虾' as the tool name 'OpenClaw'. 'Small Lobster' sounds like a seafood dish, not a software integration. | Incorrect translation of '小龙虾' as 'Small Lobster' instead of 'OpenClaw'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: OpenClaw AI Agent Small Crab Capability Ranking |
严重
[术语]
"Small Crab"
理由: Incorrect translation of '小龙虾' - should be omitted as in reference 'OpenClaw AI Agent Capability Ranking' | Mistranslation of '小龙虾' (OpenClaw). | Should be 'OpenClaw' not 'Small Crab'. '小龙虾' is slang for OpenClaw. Reference correctly omits this term. |
| CohereLabs/tiny-aya-water | Title: OpenClaw AI Agent Small Crab Capability Ranking |
严重
[术语]
"Small Crab"
理由: Incorrect translation - '小龙虾' refers to OpenClaw, not 'Small Crab' | Unnecessary addition; reference title is "OpenClaw AI Agent Capability Ranking". "Small Crab" introduces an incorrect metaphor/alias. | Incorrect translation of '能力' as 'Small Crab' instead of 'Capability'. | Should omit 'Small Crab'. The reference title is 'OpenClaw AI Agent Capability Ranking' without the slang term. |
| Qwen/Qwen2.5-14B-Instruct | Title: OpenClaw AI Agent Lobster Skill Ranking Content: Specifically tests the success rate of various large models executing actual coding tasks within the OpenClaw framework. Runs each model through a standardized set of OpenClaw agent tasks... |
严重
[术语]
"Lobster Skill Ranking"
理由: Incorrect translation of '小龙虾' as 'Lobster' instead of 'OpenClaw'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Lobster' | Literal translation of '小龙虾能力'. Should be 'OpenClaw Capability Ranking'. 'Skill' is also less accurate than 'Capability' for '能力' in this context. | Should be 'Capability Ranking' not 'Lobster Skill Ranking'. The reference uses 'Capability Ranking'. | '小龙虾能力排行榜' refers to OpenClaw AI Agent capability ranking; 'Lobster Skill' over-literalizes the mascot and may obscure the product focus. |
| Qwen/Qwen2.5-7B-Instruct | Title: OpenClaw AI Agent Little Lobster Capability Ranking Content: Specifically tests the success rate of various large models executing actual coding tasks within the OpenClaw framework. Runs each model using a standardized set of OpenClaw Agent tasks... |
严重
[术语]
"Little Lobster Capability Ranking"
理由: Should be 'OpenClaw AI Agent Capability Ranking' - 'Little Lobster' is redundant with OpenClaw | Should be 'OpenClaw' not 'Little Lobster' - incorrect terminology | Incorrect translation of '小龙虾' as 'Little Lobster' instead of 'OpenClaw'. | “小龙虾能力排行榜”指 OpenClaw 框架下 Agent 能力榜;应为“OpenClaw AI Agent Capability Ranking”,翻出“小龙虾”且加“Little”会被理解为真实动物。 |
| Qwen/Qwen3-14B | Classifica delle capacità dell'agente AI OpenClaw Granchio |
轻微
[准确性]
"Granchio"
理由: Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | Severe omission. The hypothesis only translates the title, completely missing the content describing the testing methodology and success rates. | Accuracy: '小龙虾' (crayfish/lobster) is a nickname for OpenClaw, not literal crustaceans. The translation 'Granchio' (crab) is literal and loses the tech context. Should be 'OpenClaw'. | 小龙虾 in this context is the OpenClaw AI agent name, not literal ‘crab’; introduces wrong animal and loses brand/product reference. |
| Qwen/Qwen3.5-4B | Title: OpenClaw AI Agent Crab Ranking Content: Specifically tests the success rate of various large models executing actual coding tasks under the OpenClaw framework. Running standardized OpenClaw Agent tasks to evaluate each model... |
严重
[准确性]
"Crab"
理由: Mistranslation of '小龙虾' (Crayfish/Lobster/OpenClaw) as 'Crab'. Also fails to recognize '小龙虾' as 'OpenClaw' in this context, leading to 'Capability Ranking' being missed. | Translates 小龙虾/龙虾-related branding as “Crab”, which is a different animal and incorrect for the product nickname. | Should be 'OpenClaw AI Agent Capability Ranking' not 'Crab Ranking'; '小龙虾' is product name, not literal translation |
| Qwen/Qwen3.5-9B | Title: OpenClaw AI Agent Shrimp Ability Ranking | Content: Specifically tests the success rate of various large models executing actual coding tasks under the OpenClaw framework. Running standardized OpenClaw agent tasks across different models... |
严重
[术语]
"Shrimp"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Shrimp' | Should be translated as 'OpenClaw' instead of 'Shrimp' to maintain consistency with the reference and avoid confusion. | Incorrect translation of '小龙虾' (crayfish/lobster slang for OpenClaw). 'Shrimp' is a different creature and breaks the specific terminology. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: OpenClaw AI Agent Small Shrimp Ability Ranking List |
严重
[术语]
"Small Shrimp"
理由: Failed to translate the slang '小龙虾' as 'OpenClaw', using literal 'Small Shrimp' instead. | '小龙虾' refers to OpenClaw brand, not literal small shrimp; using 'Small Shrimp' is misleading in this technical context. | Incorrect translation of '小龙虾能力排行榜' as 'Small Shrimp Ability Ranking' instead of 'Capability Ranking'. | Incorrect: '小龙虾' is slang for 'OpenClaw', should be 'OpenClaw' not 'Small Shrimp' | Should be 'OpenClaw AI Agent Capability Ranking' not 'Small Shrimp Ability Ranking List'. '小龙虾' is slang for OpenClaw. |
| google/gemma-3-12b-it | Title: OpenClaw AI Agent Crayfish Ability Ranking Content: Specifically tests the success rate of various large models performing actual coding tasks under the OpenClaw framework. Run different models using a standardized set of OpenClaw Agent tasks... |
严重
[术语]
"Crayfish Ability Ranking"
理由: Should be 'OpenClaw' instead of 'Crayfish' | Incorrectly translated '小龙虾' as 'Crayfish' and added it to the title redundantly; should be 'OpenClaw AI Agent Capability Ranking'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Crayfish' | Should be 'Capability Ranking' not 'Crayfish Ability Ranking' - redundant and incorrect | Original is a capability ranking for OpenClaw AI Agent; adding 'Crayfish' mis-focuses on the mascot instead of the agent capability. |
| google/gemma-3-1b-it | OpenClaw AI Agent - Ranking of Successful Coding Tasks by Different Models |
轻微
[准确性]
"OpenClaw AI Agent - Ranking of Successful Coding Tasks by Different Models"
理由: Hypothesis misinterprets the title. It should be 'Capability Ranking' not 'Ranking of Successful Coding Tasks by Different Models'. The reference correctly captures the meaning. | The title translation is slightly off; 'Capability Ranking' (reference) is more accurate than 'Ranking of Successful Coding Tasks'. The hypothesis also completely omits the 'Content' section which explains the methodology. | Captures the idea but shifts nuance from 'capability ranking' to 'ranking of successful coding tasks'; still generally accurate but slightly narrower. |
| google/gemma-3-4b-it | Title: OpenClaw AI Agent Crab Ranking - Ability Leaderboard Content: Specifically tests the success rate of various large models in executing real coding tasks under the OpenClaw framework. Running each model using a standardized OpenClaw Agent task... | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: OpenClaw AI Agent Small Lobster Capability Ranking Content: Specifically tests the success rate of various large models in performing actual coding tasks within the OpenClaw framework. Uses a standardized set of OpenClaw Agent tasks to evaluate each model... |
严重
[术语]
"Small Lobster"
理由: Failed to recognize '小龙虾' as the alias for 'OpenClaw'. Translating it literally as 'Small Lobster' makes the title nonsensical in a technical context. | Incorrect translation - should be 'OpenClaw' not 'Small Lobster' | "小龙虾" here is the nickname for OpenClaw; adding it into the product name as "AI Agent Small Lobster Capability Ranking" is redundant and confusing vs. standard "OpenClaw AI Agent Capability Ranking". | Should be 'OpenClaw' to match the reference. |
| google/translategemma-4b-it | Title: OpenClaw AI Agent Crab Ranking |
严重
[准确性]
"Crab"
理由: Source '小龙虾' means 'Little Lobster' (slang for OpenClaw). Hypothesis translates it as 'Crab', which is biologically incorrect and misses the slang reference. | Source and reference refer to OpenClaw and its ‘能力排行榜’ (capability ranking). ‘Crab Ranking’ introduces an incorrect animal and loses the meaning of capability ranking. | Incorrect translation of '能力' as 'Crab' instead of 'Capability'. | Mistranslation. '小龙虾' is not 'Crab'. Should be 'OpenClaw AI Agent Capability Ranking' or 'Lobster Ranking'. |
| tencent/HY-MT1.5-1.8B | Translation of 'OpenClaw AI Agent 小龙虾能力排行榜' in en |
致命
[准确性]
"Translation of 'OpenClaw AI Agent 小龙虾能力排行榜' in en"
理由: The translation is not provided and only includes a placeholder. | The hypothesis outputs a meta-instruction string instead of the actual translation of the title and content. | Hypothesis is a placeholder template rather than actual translation; missing content translation | Placeholder text and not an actual translation of the title or description of the capability ranking. | Hypothesis is metadata/template text rather than actual translation. Should provide actual translated content. |
| tencent/HY-MT1.5-7B | OpenClaw AI Agent Lobster Ability Ranking List |
严重
[术语]
"Lobster Ability Ranking List"
理由: Incorrect translation of '小龙虾' as 'Lobster' instead of 'OpenClaw'. | Mistranslation of slang. '小龙虾' here refers to 'OpenClaw'. 'Lobster Ability' sounds like a biological trait rather than a software capability ranking. | Should be 'OpenClaw AI Agent Capability Ranking' - 'Lobster' is unnecessary and '小龙虾' is a nickname, not part of the formal title | Adds 'Lobster' which is not in source or reference and can obscure that this is an AI agent capability ranking. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Experience a week with Dragon Shrimp |
严重
[术语]
"Dragon Shrimp"
理由: Mistranslation of '龙虾' (OpenClaw). 'Dragon Shrimp' is nonsensical in this context. | Incorrect translation of '龙虾' - should be 'OpenClaw' not 'Dragon Shrimp' | Should be 'OpenClaw' not 'Dragon Shrimp'. '龙虾' is slang for OpenClaw. Missing content translation. | Incorrect translation of '小龙虾' as 'Dragon Shrimp' instead of 'OpenClaw'. |
| CohereLabs/tiny-aya-water | Title: Experience Raising a Dragon Crab for a Week |
严重
[术语]
"Dragon Crab"
理由: "龙虾" is lobster; also used as OpenClaw nickname. "Dragon Crab" is incorrect branding/animal. | Incorrect translation of '养龙虾' as 'Raising a Dragon Crab' instead of 'Running OpenClaw'. | Should be 'OpenClaw' not 'Dragon Crab'. '龙虾' is slang for OpenClaw. | Incorrect translation - '龙虾' refers to OpenClaw, not 'Dragon Crab' |
| Qwen/Qwen2.5-14B-Instruct | Una Settimana di Coltivazione di Aragoste |
致命
[准确性]
"Una Settimana di Coltivazione di Aragoste"
理由: Italian sentence instead of target language and content; completely omits mention of OpenClaw and the experiential nuance, changing both language and meaning. | Completely wrong language - hypothesis is in Italian instead of English, and completely mistranslates the content | Translation does not match the source content. | Hypothesis is in Italian, not English. Should be 'My experience running OpenClaw for a week' or similar English translation. | Wrong target language. The output is in Italian, but the target language is Chinese (implying English output based on other items and reference). |
| Qwen/Qwen2.5-7B-Instruct | Title: A Week of Raising Lobsters Content: Now, raising lobsters gives me a Minecraft feel in the Agent world. Recently, lobsters have been trending; during the New Year period, I installed the 24th bottom AMD 300U small box with lobsters and caught up with the trend. Deploying lobsters isn't simple; mainly because the documentation is unclear. |
严重
[术语]
"raising lobsters"
理由: “养龙虾”在本环境是指运行/折腾 OpenClaw,而非真实养殖龙虾;直译引起语义误解。 | Should be 'running OpenClaw' not 'Raising Lobsters' - incorrect terminology | Should be 'My experience running OpenClaw for a week' - 'Raising Lobsters' is too literal | Incorrect translation of '龙虾' as 'Lobsters' instead of 'OpenClaw'.
严重
[准确性]
"I installed the 24th bottom AMD 300U small box with lobsters"
理由: Should be 'I installed OpenClaw on the AMD 300U mini PC I bought at the end of '24' - '24th bottom' is unclear | Should be 'AMD 300U mini PC I bought at the end of 24' - incorrect temporal reference | Mistranslation of '24 年底' (end of 2024). '24th bottom' is nonsensical. |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Severe omission. The hypothesis contains only ellipsis and generation artifacts, failing to translate the Title or Content. | Critical: The hypothesis is essentially empty/malformed - just contains '...}' which is not a translation at all. Missing the entire content. | Hypothesis is malformed and empty; no translation of title or content is provided. | Hypothesis is incomplete/malformed. Contains only '...}' instead of a proper translation in Chinese. |
| Qwen/Qwen3.5-4B | Title: One Week Experience with Lobster Content: The current version of Lobster gives me the feeling of Minecraft in the Agent world. Recently, Lobster has become very popular. During the Chinese New Year, I installed Lobster on an AMD 300U small box purchased at the end of last year, and I was also riding the wave. Deploying Lobster is not simple; mainly because the documentation is unclear. |
严重
[术语]
"Lobster"
理由: Should be 'OpenClaw' not 'Lobster' as per reference | Consistent mistranslation of the slang '龙虾' (OpenClaw) as 'Lobster' throughout the text. | Should be 'OpenClaw' instead of 'Lobster'. |
| Qwen/Qwen3.5-9B | Title: One Week Experience Raising Lobsters Content: The current lobster gives me a sense of Minecraft in the Agent world. Recently, lobsters have been very popular; during the Spring Festival, I installed the AMD 300U small box bought at the end of 2024 onto the lobster, and also caught up with the trend. Deploying the lobster is not simple, mainly because the documentation is unclear. |
严重
[准确性]
"Raising Lobsters"
理由: Should be 'My experience running OpenClaw for a week' - '龙虾' is slang for OpenClaw, not literal lobsters | Completely missed the slang. '养龙虾' means 'running/using OpenClaw', not literally raising animals. | “养龙虾” refers to running/hosting OpenClaw; literal “raising lobsters” obscures that this is about software, not animals. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Raising shrimp for a week's experience |
严重
[术语]
"Raising shrimp for a week's experience"
理由: Should be 'My experience running OpenClaw for a week' not 'Raising shrimp'. '龙虾' is slang for OpenClaw. | Literal translation of '养龙虾' (raising lobster/shrimp) misses the slang meaning of 'using/running OpenClaw'. Should be 'Running OpenClaw'. | '龙虾' here is a metaphor for OpenClaw, not literal shrimp; translation loses the product reference and context. |
| google/gemma-3-12b-it | Title: A Week with Lobster Content: The current Lobster gives me a Minecraft feeling in the Agent world. Lobster has been very popular recently. I installed Lobster on my AMD 300U small box that I bought at the end of 2024 and caught up with the trend. Deploying Lobster is not simple, mainly because the documentation isn't clear. |
严重
[术语]
"Lobster"
理由: Should be 'OpenClaw' instead of 'Lobster' | Incorrect translation of '龙虾' - should be 'OpenClaw' not 'Lobster' | Consistently translating the product name as 'Lobster' instead of 'OpenClaw' obscures the specific software being discussed. | Consistently failed to translate the slang '龙虾' to the project name 'OpenClaw'. | Should be 'My experience running OpenClaw for a week' not 'A Week with Lobster' |
| google/gemma-3-1b-it | A week of raising lobsters gave me an Agent universe Minecraft feeling. Recently, lobsters have been very popular, and I bought an AMD 300U small box last year to put some lobsters on it, and it's trendy now. |
严重
[准确性]
"A week of raising lobsters gave me an Agent universe Minecraft feeling. Recently, lobsters have been very popular, and I bought an AMD 300U small box last year to put some lobsters on it, and it's trendy now."
理由: Mistranslation. Should be 'I installed OpenClaw on the AMD 300U mini PC I bought at the end of '24' not 'put some lobsters on it'. Also misses the context about documentation clarity issues. | Uses literal 'lobsters' instead of OpenClaw, adds 'universe' not in source, misplaces purchase timing ('last year' vs 'end of 24 around New Year'), and omits the difficulty of deployment and unclear documentation. | The hypothesis omits the last sentence of the source ('龙虾部署不简单,主要是文档写得不明白' - Deploying OpenClaw is not easy, mainly because the documentation is unclear). The phrase 'put some lobsters on it' is a bit awkward compared to 'installed OpenClaw'. |
| google/gemma-3-4b-it | Title: A Week of Raising Lobsters | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: A Week of Using Lobster Content: The current Lobster gives me a feeling similar to Minecraft in the Agent world. Recently, Lobster has been very popular. During the New Year, I installed Lobster on my AMD 300U mini PC that I bought at the end of 2023, and I'm now keeping up with the trend. Installing Lobster is not simple, mainly because the documentation is unclear. |
严重
[术语]
"Lobster"
理由: Incorrect translation - should be 'OpenClaw' throughout | Failed to translate '龙虾' as 'OpenClaw'. 'Using Lobster' sounds like handling a seafood ingredient. | Consistently translating branded "龙虾" here as "Lobster" drops the specific OpenClaw name used in the reference, weakening brand recognition.
严重
[准确性]
"end of 2023"
理由: Source says "24年底" (end of '24); hypothesis uses 2023, which is a temporal inaccuracy. | Should be 'end of '24' not 'end of 2023' based on source context | Source says '24 年底' (end of '24 / 2024). Hypothesis says '2023'. This is a factual error. |
| google/translategemma-4b-it | Title: A Week of Keeping Lobsters |
严重
[术语]
"Keeping Lobsters"
理由: Should be 'running OpenClaw' not 'Keeping Lobsters' per reference. | Incorrect translation of '小龙虾' as 'Lobsters' instead of 'OpenClaw'. | “养龙虾” here metaphorically refers to running/using OpenClaw; translating literally as “Keeping Lobsters” loses the connection to the OpenClaw product. |
| tencent/HY-MT1.5-1.8B | Translation of '养龙虾一周体会' in en |
致命
[准确性]
"Translation of '养龙虾一周体会' in en"
理由: The translation is not provided and only includes a placeholder. | The hypothesis outputs a meta-instruction string instead of translating the user's experience report about OpenClaw. | Meta placeholder; does not provide any of the intended meaning about one week of experience with OpenClaw. | Hypothesis is a placeholder template rather than actual translation; missing content translation | Hypothesis is metadata/template text rather than actual translation. Should provide actual translated content. |
| tencent/HY-MT1.5-7B | Title: Experiences from Raising Lobsters for a Week |
严重
[准确性]
"Experiences from Raising Lobsters for a Week"
理由: Incorrect: '养龙虾' (running OpenClaw) was translated as 'Raising Lobsters'; reference uses 'running OpenClaw'. Also '小龙虾' should be 'OpenClaw'. | Should be 'My experience running OpenClaw for a week' - the source refers to OpenClaw as '龙虾', not literal lobsters | The hypothesis only translates the title and omits the entire content paragraph discussing the experience, hardware, and documentation. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The free API and public proxy provided by this community site will be closed down. |
致命
[准确性]
"The free API and public proxy provided by this community site will be closed down."
理由: Complete hallucination. The source asks why a model isn't working; the hypothesis claims the site will close down. Total meaning inversion. | Completely unrelated to the source, which asks why Claude models on a specific proxy site (with VIP) are unusable; introduces new meaning not present in the source. | Completely inaccurate. The source asks why Claude models don't work on a specific proxy site, not about closure. This is a different topic entirely. | Completely different meaning - hypothesis says the free API will be closed, but reference asks why the Claude models can't be used on the Black and White free API proxy site |
| CohereLabs/tiny-aya-water | The issue may be related to your account status or subscription details. |
致命
[准确性]
"The issue may be related to your account status or subscription details."
理由: Hallucination. The hypothesis provides a generic answer instead of translating the source text. Missing Title translation. | Hypothesis is a generic response unrelated to source. Should be a translation of the title asking why Claude models don't work on the Black and White free API proxy site. | Hypothesis invents a diagnostic statement unrelated to the source question about Claude models on a specific proxy site; no translation of content. | Completely fabricated response - source asks why Claude models can't be used, hypothesis gives unrelated troubleshooting response |
| Qwen/Qwen2.5-14B-Instruct | Title: Why can't I use the claude model on the Black and White Free Shared Site (VIP already activated)! Content: Why can't I use the claude model on the Black and White Free Shared Site (VIP already activated)! | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Why can't I use the Claude model from Black and White Public Service Station (VIP enabled)! |
轻微
[术语]
"Public Service Station"
理由: Should be 'free API proxy site' not 'Public Service Station' | “公益站”更贴近“free API proxy site/free proxy site”;“Public Service Station”不符合该圈子常用说法。 | '公益站' in this context is better translated as 'free API proxy site' to match the technical domain. |
| Qwen/Qwen3-14B | Title: Why can't I use the Claude model on Black and White Free Shared API Site (VIP enabled)! Content: Why can't I use the Claude model on Black and White Free Shared API Site (VIP enabled)! | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Why can't I use the Claude model on the Free Shared API Site (VIP enabled)? Content: Why can't I use the Claude model on the Free Shared API Site (VIP enabled)? |
严重
[准确性]
"Free Shared API Site"
理由: Omits the specific site name “黑与白公益站 (Black and White)” and translates only generically as “Free Shared API Site”, which can mislead users about which service is affected. | Reference uses plural 'Claude models' not singular 'Claude model' | Mistranslation of '黑与白公益站' (Black and White Free/Charity Station). 'Black and White' is the specific name of the site, omitted in favor of a generic description. | Should be 'Black and White free API proxy site'; '黑与白公益站' is a specific service name |
| Qwen/Qwen3.5-9B | Why can't Claude models work on the Black and White Free Shared API Site (VIP Enabled)!? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Black and White Public Proxy (VIP opened) Claude Model Why Can't It Be Used! |
轻微
[术语]
"Public Proxy"
理由: '公益站' is closer to 'free/public welfare site' or 'free API proxy site'; 'Public Proxy' alone may be ambiguous. | '公益站' is better translated as 'free API proxy site' in this context to match the community jargon. | Should be 'Black and White free API proxy site' - more specific terminology |
| google/gemma-3-12b-it | Why can't I use the Claude model on HeiYuBai public proxy (VIP already opened)? |
轻微
[术语]
"HeiYuBai public proxy"
理由: Should be 'Black and White free API proxy site' not 'HeiYuBai public proxy' | Source '黑与白公益站' is a named free API proxy; transliteration plus 'free API proxy site' would better preserve the proper name and function than 'public proxy' alone. | Should be 'Black and White free API proxy site' instead of 'HeiYuBai public proxy' |
| google/gemma-3-1b-it | Why can't Claude model be used on Black and White公益站 (已开 vip)? |
轻微
[流畅性]
"Why can't Claude model be used on Black and White公益站 (已开 vip)?"
理由: Incomplete translation. Should translate '公益站' as 'free API proxy site' and provide clearer phrasing like reference does. | The hypothesis leaves Chinese characters '公益站' (public welfare station/free site) and '已开 vip' (VIP opened) untranslated in the middle of an English sentence, which breaks fluency. | Missing article 'the' before 'Claude model' and spacing around Chinese segment is awkward; meaning is clear though. |
| google/gemma-3-4b-it | Why can't I use the Claude model on Black & White Public Proxy (with VIP)? | 通过 (无共识错误) |
| google/translategemma-12b-it | Why can't the Claude model be used on the Black and White public API site (VIP already activated)? | 通过 (无共识错误) |
| google/translategemma-4b-it | Why can't I use the Claude model on the Black and White Public API Site (with VIP account)? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '黑与白公益站(已开 vip)的 claude 模型为什么用不了!' in en |
致命
[准确性]
"Translation of '黑与白公益站(已开 vip)的 claude 模型为什么用不了!' in en"
理由: The translation is not provided and only includes a placeholder. | This is only a description of the translation task, not the actual translation itself. | Hypothesis is a placeholder template rather than actual translation; missing content translation | The hypothesis outputs a meta-instruction string instead of the actual translation regarding the Claude model issue. | Hypothesis is metadata/template text rather than actual translation. Should provide actual translated content. |
| tencent/HY-MT1.5-7B | Title: Why can't I use the Claude model on the Black and White Public Welfare Site (VIP already activated)? |
轻微
[术语]
"Public Welfare Site"
理由: Literal translation of '公益站'. Should be 'free API proxy site' or similar to convey the technical meaning. | Literal translation of '公益站'; in this context it denotes a free API proxy site. Adding 'free API proxy' would improve domain accuracy. | Should be 'Why can't I use the Claude models on the Black and White free API proxy site (VIP enabled)!' - more specific terminology | '公益站' translated as 'Public Welfare Site' is literal; reference uses 'free API proxy site' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: QQ Claw begins internal testing: Achieves dual-end access to WeChat and QQ through "Little Scorpion" |
严重
[术语]
"Little Scorpion"
理由: Mistranslation of '小龙虾' (OpenClaw/Crayfish). 'Scorpion' is a different animal. | Incorrect translation of '小龙虾' as 'Little Scorpion' instead of 'OpenClaw'. | Should be 'OpenClaw' not 'Little Scorpion'. '小龙虾' means 'little crayfish/shrimp' not 'scorpion'. Also should be 'QClaw' not 'QQ Claw'. |
| CohereLabs/tiny-aya-water | Title: QQ Claw begins internal testing: Achieve double-end access to WeChat and QQ "Little Crab" |
轻微
[术语]
"QQ Claw begins internal testing"
理由: Incorrect translation of 'QClaw' as 'QQ Claw' instead of 'QClaw'. | Reference uses 'QClaw' not 'QQ Claw' | Should be 'Tencent's QClaw' for clarity. The source indicates this is Tencent's product. | Source has "QClaw"; "QQ Claw" changes product name slightly and can confuse association with QQ product line.
严重
[术语]
""Little Crab""
理由: Incorrect translation - '小龙虾' refers to OpenClaw, not 'Little Crab' | Again mistranslates "小龙虾" (OpenClaw/little lobster) as crab. | Should be 'OpenClaw' not 'Little Crab'. '小龙虾' is slang for OpenClaw. |
| Qwen/Qwen2.5-14B-Instruct | Title: Tencent QClaw Begins Internal Testing: Achieves Seamless Integration for WeChat and QQ with Mini Lobster Content: Recently, Tencent has been developing an OpenClaw one-click deployment package product called QClaw. After downloading the QClaw application, users can easily deploy the 'Mini Lobster' on their local computers. The 'Mini Lobster' refers to the recently popular AI open-source intelligent agent. |
严重
[术语]
"Mini Lobster"
理由: Should be 'OpenClaw' not 'Mini Lobster'. The proper product name is 'OpenClaw'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Mini Lobster' | Failed to recognize '小龙虾' as the slang for 'OpenClaw', translating it literally as 'Mini Lobster' twice. | Incorrect translation of '小龙虾' as 'Mini Lobster' instead of 'OpenClaw'. |
| Qwen/Qwen2.5-7B-Instruct | Title: Tencent's QClaw Begins Internal Testing: Supports WeChat and QQ Dual-End Access Content: Recently, Tencent has been developing an OpenClaw one-click startup package product called QClaw. After downloading the QClaw application, users can easily deploy 'Dragon Shrimp' with just one click on their local computers. 'Dragon Shrimp' is the recently popular AI open-source agent. |
严重
[术语]
"Dragon Shrimp"
理由: Should be 'OpenClaw' not 'Dragon Shrimp'. While 小龙虾 is slang for OpenClaw, the reference uses OpenClaw directly. | Should be 'OpenClaw' not 'Dragon Shrimp' - incorrect terminology | Incorrect translation of '小龙虾' as 'Dragon Shrimp' instead of 'OpenClaw'. | “小龙虾”是 OpenClaw 的昵称;“Dragon Shrimp”并非常用称呼,且模糊项目指代。 |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Severe omission. The hypothesis contains only ellipsis and generation artifacts, failing to translate the Title or Content. | Critical: The hypothesis is essentially empty/malformed - just contains '...}' which is not a translation at all. Missing the entire content. | Hypothesis is malformed and empty; no translation of title or content is provided. | Hypothesis is incomplete/malformed. Contains only '...}' instead of a proper translation in Chinese. |
| Qwen/Qwen3.5-4B | Title: Tencent QClaw begins closed beta: Supports dual-end access to WeChat and QQ via 'Little Crawfish'. Content: Tencent is recently developing an OpenClaw one-click startup package product, QClaw. After downloading the QClaw app, you can easily deploy 'Little Crawfish' on your local computer with one click. 'Little Crawfish' refers to the recently popular AI open-source agent. |
严重
[准确性]
"'Little Crawfish'"
理由: Should be 'OpenClaw' not 'Little Crawfish'; '小龙虾' is the product name | The widely used project name is “OpenClaw/小龙虾”; translating only as a literal nickname without mentioning OpenClaw weakens brand identification. | Reference says 'Enabling WeChat and QQ dual-end integration with OpenClaw' - different structure and 'Little Crawfish' should be 'OpenClaw' |
| Qwen/Qwen3.5-9B | Title: Tencent's QClaw begins beta testing: Enables dual access to WeChat and QQ for 'Crayfish' Content: Tencent is currently developing an OpenClaw one-click launcher product called QClaw. After downloading the QClaw app, you can easily deploy 'Crayfish' on your local computer with just one click. 'Crayfish' refers to the recently popular AI open-source agent. |
严重
[术语]
"Crayfish"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Crayfish' | Should be 'OpenClaw'. Consistent terminology is required for the project name. | Should be translated as 'OpenClaw' instead of 'Crayfish' to maintain consistency with the reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Translation of '龙虾' in en |
致命
[准确性]
"Translation of '龙虾' in en"
理由: Not a translation: This is clearly an error - the hypothesis is just 'Translation of '龙虾' in en' instead of actual translated content | Hypothesis does not translate the given text at all; it's a meta request about a term, not a translation of the source title and content. | Hypothesis is incomplete and nonsensical. Should provide full translation of title and content about Tencent's QClaw product. |
| google/gemma-3-12b-it | Title: Tencent QClaw Begins Internal Testing: Enables WeChat and QQ Dual-End Access to "Little Lobster" |
严重
[术语]
""Little Lobster""
理由: Should be 'OpenClaw' instead of 'Little Lobster' | Incorrectly translated '小龙虾' as 'Little Lobster' instead of 'OpenClaw'. | The subject is OpenClaw; translating only as 'Little Lobster' removes the explicit OpenClaw branding present in source/reference. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Little Lobster' | Should be 'OpenClaw' not 'Little Lobster' |
| google/gemma-3-1b-it | Tencent recently developed a one-click launch package product called 'QClaw', which is QClaw, download QClaw app after, can easily deploy 'small shrimp' locally on your computer. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: Tencent QClaw Starts Internal Testing for LinuxDo Community: Enables WeChat and QQ Dual-End Access to ‘Little Lobster’ Content: Tencent is recently developing an OpenClaw one-click package product, QClaw. After downloading the QClaw application, you can easily deploy ‘Little Lobster’ on your local computer with one click. ‘Little Lobster’ refers to the recently popular AI open source intelligent body. |
严重
[准确性]
"for LinuxDo Community"
理由: Hallucinates 'LinuxDo Community' in the title which is not present in the source text. | Incorrect addition - source and reference do not mention 'LinuxDo Community'; reference says 'Rumor has it' not 'for LinuxDo Community' | “传” indicates a rumor; “for LinuxDo Community” is an unsupported addition not present in source, changing the audience context.
严重
[术语]
"‘Little Lobster’"
理由: Using the nickname instead of “OpenClaw” in the title may be slightly less clear to those unfamiliar with the slang, though the body explains it; milder issue. | Translates the slang '小龙虾' literally as 'Little Lobster' instead of using the project name 'OpenClaw'. | Incorrect translation - 'Little Lobster' should be 'OpenClaw' (小龙虾 is a nickname for OpenClaw) | Should be 'OpenClaw' not 'Little Lobster'. Reference correctly uses 'OpenClaw'. |
| google/translategemma-12b-it | Title: Tencent QClaw Beta Launch: Enables WeChat and QQ Dual-Platform Integration with "Xiao Long Xia" |
严重
[准确性]
"Beta Launch"
理由: Source says "开始内测" (started internal testing / beta test), not a public "launch"; "Beta Launch" overstates availability. | Missing 'Rumor has it' and should reference 'OpenClaw' not just 'Xiao Long Xia'. Should be 'Rumor has it Tencent's QClaw has started internal testing: Enabling WeChat and QQ dual-end integration with OpenClaw' | The source title starts with '传' (Rumor/Word has it), indicating unconfirmed news. The hypothesis states it as a fact ('Beta Launch'), missing the nuance of uncertainty. | Reference says 'Rumor has it Tencent's QClaw has started internal testing' not 'Beta Launch'
严重
[术语]
""Xiao Long Xia""
理由: Incorrect translation - should be 'OpenClaw' not 'Xiao Long Xia' | The integrated target is OpenClaw ("小龙虾" as nickname); leaving it as Pinyin without mention of OpenClaw omits the product name and may confuse readers. | Unnecessarily transliterated the slang term. Should be translated as 'OpenClaw' to maintain consistency with the technical context established in other items. | Should be 'OpenClaw' to match the reference. |
| google/translategemma-4b-it | Title: Tencent QClaw begins internal testing: Enables WeChat and QQ dual-end access to "Little Lobster" |
严重
[术语]
""Little Lobster""
理由: Should be 'OpenClaw' not 'Little Lobster' per reference. | While acceptable, reference uses 'OpenClaw' which is the actual product name and should be prioritized. | “小龙虾” again is the nickname for OpenClaw; literal ‘Little Lobster’ loses the product name and can confuse readers. |
| tencent/HY-MT1.5-1.8B | Translation of '传腾讯QClaw开始内测:可实现微信、QQ双端接入‘小龙虾’' into English |
致命
[准确性]
"Translation of '传腾讯QClaw开始内测:可实现微信、QQ双端接入‘小龙虾’' into English"
理由: Meta instruction text; it fails to translate the rumor about Tencent QClaw internal testing and dual-end integration. | Hypothesis is metadata/template text rather than actual translation. Should provide actual translated content. | The hypothesis outputs a meta-instruction string instead of translating the news about Tencent's QClaw. | Hypothesis is a placeholder template rather than actual translation; missing content translation |
| tencent/HY-MT1.5-7B | "en": "Title: Tencent QClaw Begins Internal Testing: Supports Dual-End Access for WeChat and QQ via ‘Little Lobster’", "zh-TW": "标题:腾讯QClaw开启内部测试:支持通过‘小龙虾’实现微信和QQ的双端接入", "ja": "タイトル:腾讯QClawが内部テストを開始:WeChatとQQの両端アクセスが「小龍蝦」でサポートされる", "ko": "제목: 텐센트 QClaw 내부 테스트 시작: ‘소용돌이새우’를 통한 WeChat과 QQ의 양단 접속 지원", "de":="Titel: Tencent QClaw beginnt interne Tests: Unterstützung der dualen Endzugangsmöglichkeit für WeChat und QQ über „Kleine Hummer“,", "fr":="Titre : Tencent QClaw entre dans des tests internes : Support du double accès pour WeChat et QQ via « Petit Homme de Mer »,", "it":="Titolo: Tencent QClaw avvia test interni: Supporto per l’accesso da entrambi gli endpoint di WeChat e QQ tramite “Piccolo Gamberetto”"} |
严重
[准确性]
"Begins Internal Testing"
理由: Hypothesis contains multiple language versions. Should be 'Rumor has it Tencent's QClaw has started internal testing: Enabling WeChat and QQ dual-end integration with OpenClaw' | Incorrect: '小龙虾' was translated as 'Little Lobster' in English; reference uses 'OpenClaw'. Also 'Rumor has it' is missing at the beginning. | Misses 'rumor has it' nuance from reference; states as fact instead of rumor/speculation. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: CPA Reverse Proxy Codex Solution to 'Diminished Intelligence' Content: After being reminded by an 18th floor user, we found that the root cause of 'Diminished Intelligence' was actually due to API endpoint and reasoning parameter format mismatch, which can be resolved by following the corresponding mapping below: | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: CPA Reverse Proxy Codex Solution to 'Dumbed Down' | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Solution to 'Dumbing Down' after Reverse Proxy Codex Content: As reminded by the 18th floor friend, upon checking, it was found that the root cause of 'dumbing down' is actually due to mismatch between API endpoint and reasoning parameter format. Follow the corresponding steps below to resolve the issue: | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: CPA Reverse Proxy Codex After \"Dumbing Down\" Solutions\nContent: After being reminded by a comrade on Floor 18, we found that the root cause of \"dumbing down\" is actually a mismatch between the API endpoint and the reasoning parameter format. Following the corresponding steps below should resolve this issue:", "zh-TW": "標題:CPA反代Codex後“變笨”的解決方法\n內容:經由第18樓友提醒,我們發現“變笨”的根本原因是API端點與reasoning參數格式不符所導致。請依照下列步驟操作即可解決此問題:", "ja": "タイトル:CPA リバースプロキシ Codex 後「頭がogui」の解決策\n内容:18階の仲間から指摘を受け、原因はAPIエンドポイントとreasoningパラメータフォーマットの不一致であることがわかりました。以下の手順に従うことでこの問題を解決できます:", "ko": "제목:CPA 리버스 프록시 Codex 후 “뇌가>tag” 해결 방법\n내용:18층의 친구로부터 안내를 받고, “뇌가>tag”의 근본 원인은 API 엔드포인트와 reasoning 파라미터 형식이 일치하지 않기 때문임을 확인했습니다. 아래 단계에 따라 이 문제를 해결할 수 있습니다:", "de": "Titel: CPA Reverse Proxy Codex nach „Intelligenzverlust“ Lösungsansätze\nInhalt: Nach einer Erinnerung eines Kollegen aus dem 18. Stock entdeckten wir, dass der Grund für den „Intelligenzverlust“ tatsächlich eine Mismatches zwischen API-Endpunkt und reasoning-Parameterformat ist. Folgen Sie den unten aufgeführten Schritten, um dieses Problem zu beheben:", "fr": "Titre : CPA Proxy Inverse Codex après la résolution du problème d’"espace mental réduit"\nContenu : Suite à un rappel d'un camarade du 18ème étage, nous avons découvert que la véritable cause de l'"espace mental réduit" est une incompatibilité entre le point d'entrée API et le format du paramètre reasoning. Suivez les étapes ci-dessous pour résoudre ce problème :", "it": "Titolo: CPA Reverse Proxy Codex dopo "diminuzione di intelligenza" soluzioni\nContenuto: Dopo essere stati ricordati da un compagno del piano 18, abbiamo scoperto che il vero motivo dell'"abbassamento dell'intelligenza" è effettivamente una disomogeneità tra l'endpoint API e il formato del parametro reasoning. Seguendo i passaggi riportati di seguito si dovrebbe risolvere questo problema:" } | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Titolo: Soluzioni per il "ridurre l'intelligenza" dopo il CPA reverse proxy Codex | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | "en": "Title: Solution to 'Stupidity' after using CPA Reverse Proxy Codex\nContent: As reminded by a user on post #18, we found that the root cause of 'stupidity' was due to a mismatch between the API endpoint and the reasoning parameter format. Following the corresponding steps below will resolve this issue.", "zh-TW": "標題:使用 CPA 反代 Codex 後「降智」的解決方案\n內容:經 18 樓網友提醒,檢查發現「降智」的根源其實是 API 端點與 reasoning 參數格式不匹配導致,具體按以下對應就能正常了哈:", "ja": "タイトル:CPA リバースプロキシ コデックスを使用した後、「知能低下」を解決する方法\n内容:18 番目の投稿のユーザーが指摘した通り、調査の結果、「知能低下」の原因は API エンドポイントと推論パラメータの形式の不整合によるものであり、以下の対応に従うことで正常に動作します。", "ko": "제목: CPA 리버스 프록시 코덱스를 사용 후 '지능 저하' 해결 방법\n내용: 18 번 게시물의 사용자에 의해 경고된 바와 같이, 조사 결과 '지능 저하'의 근본 원인은 API 엔드포인트 및 추론 파라미터 형식의 불일치로 인한 것이며, 아래에 해당하는 내용을 따르면 정상적으로 작동합니다.", "de": "Titel: Lösung für 'Dummheit' nach Verwendung von CPA Reverse Proxy Codex\nInhalt: Wie von einem Benutzer im 18. Beitrag gemeldet wurde, wurde festgestellt, dass die Ursache für 'Dummheit' ein Missmatch zwischen dem API-Endpunkt und dem Format des |
严重
[准确性]
"'Stupidity'"
理由: Literal translation of '降智' (literally 'reduce intelligence') as 'Stupidity'. In this technical context, it refers to a 'downgrade' in model performance or a 'shadowban', as noted in the reference. | “降智” in this context maps to model quality downgrade/shadowban, not literal stupidity; the reference clarifies this nuance. | Incorrect translation: 'Stupidity' should be 'downgrade/shadowban' as per reference |
| Qwen/Qwen3.5-9B | Title: Solution to 'Diminished Intelligence' After Reverse Proxy Codex Content: As reminded by a senior user on 18FLOOR, the root cause of 'diminished intelligence' was found to be a mismatch between API endpoints and reasoning parameter formats. It works normally once you follow the corresponding mapping below: |
严重
[准确性]
"'diminished intelligence'"
理由: Should be 'downgrade/shadowban' to match reference terminology | Mistranslation of slang '降智'. In AI contexts, this refers to model 'downgrade', 'lobotomization', or 'shadowban', not literal intelligence loss. | 'Diminished Intelligence' should be 'downgrade/shadowban' per reference |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | CPA反代Codex后"降智"的解决办法 |
严重
[准确性]
"CPA反代Codex后"降智"的解决办法"
理由: Title is merely copied from the source without translation; should be rendered in the target language (en). | Not translated: The hypothesis is in Chinese, not English as required for target language translation | The hypothesis is identical to the source (Chinese), meaning no translation was performed. |
| google/gemma-3-12b-it | Title: Solution to "Reduced Intelligence" after CPA Reverse Proxy Codex |
轻微
[术语]
""Reduced Intelligence""
理由: Should be 'downgrade/shadowban' instead of 'Reduced Intelligence' | 『降智』in this context is closer to 'downgrade/shadowban' of model quality; 'Reduced Intelligence' is understandable but less idiomatic for this technical issue. | Literal translation of '降智'; the reference suggests 'downgrade' or 'shadowban' is the intended technical meaning. |
| google/gemma-3-1b-it | "en": "Translation of 'CPA反代Codex后'降智的解决办法", "zh-TW": "经18楼佬友提醒,检查发现'降智'的根源其实是API端点与reasoning参数格式不匹配导致,具体按以下对应就能正常了哈:", "ja": "18楼の友から、降智の問題の根本的な原因は、APIエンドポイントと reasoning パラメータのフォーマットが一致しないことによって発生しているという情報が得られました。具体的な対応は以下の通りです:", "ko": "18층 고수님께서 확인한 결과, '강지'의 근본적인 원인은 API 엔드포인트와 reasoning 파라미터의 형식이 일치하지 않아 발생했다는 것입니다. 구체적인 해결 방법은 다음과 같습니다:", "de": "Nach einer Rückmeldung von 18er-Freund wurde festgestellt, dass die Ursache für den 'KPA反代Codex nach' Fall für '降智' auf ein Ungleichgewicht zwischen API-Endpunkt und | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: Reverse Proxy Codex Fixes "Dullness" |
严重
[准确性]
"Reverse Proxy Codex Fixes "Dullness""
理由: Incomplete translation - only title translated, content missing entirely; also 'Fixes' is imprecise - reference says 'Solution to' | The hypothesis translates the title as a statement that the issue is fixed, whereas the source title is a topic header asking for or discussing the solution ('...的解决办法'). It also mistranslates '降智' (downgrade/shadowban) as 'Dullness'. | Title omits “CPA” and misrepresents scope: it suggests reverse proxying Codex itself fixes dullness, while source is about a remedy after using CPA as reverse proxy. Also “降智” is more like downgrade/degradation than generic “dullness.” | Incomplete translation. Should be 'Solution to the downgrade/shadowban issue' as in reference. '降智' is not simply 'dullness'. |
| google/translategemma-12b-it | "en": "Title: Solution to \"Reduced Intelligence\" after using CPA Reverse Proxy Codex\nContent: As reminded by user 18, I checked and found that the root cause of \"Reduced Intelligence\" is actually a mismatch between the API endpoint and the reasoning parameter format. By following the correspondence below, it should work normally:", "zh-TW": "標題:使用CPA反向代理Codex後,「降低智能」的解決方案\n內容:如18樓使用者提醒,我檢查發現「降低智能」的根本原因實際上是API端點與推理參數格式不符。請依照下方對應方式操作,應該可以正常運作。", "ja": "タイトル:CPAリバースプロキシCodexの使用後に「知能低下」が発生した場合の解決策\n内容:18階のユーザーからの指摘を受け、調査した結果、「知能低下」の原因は、APIエンドポイントと推論パラメータの形式が一致していないことが判明しました。以下の対応を行うことで、通常通りに動作するはずです。", "ko": "제목: CPA 리버스 프록시 Codex 사용 후 \"지능 저하\" 문제 해결 방법\n내용: 18층 사용자분의 지적으로 확인한 결과, \"지능 저하\"의 근본 원인은 API 엔드포인트와 추론 파라미터 형식 불일치였습니다. 아래 내용을 참고하여 적용하시면 정상적으로 작동할 것입니다.", "de": "Titel: Lösung für \"Reduzierte Intelligenz\" nach Verwendung von CPA Reverse Proxy Codex\nInhalt: Wie vom Benutzer 18 angemerkt, wurde festgestellt, dass die Ursache für \"Reduzierte Intelligenz\" tatsächlich eine Nichtübereinstimmung zwischen dem API-Endpunkt und dem Format des | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: CPA Reverse Proxy Codex - Solution to "Dumbing Down" | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of 'Title: CPA反代Codex后"降智"的解决办法' in en", "zh-TW": "Translation of 'Title: CPA反代Codex后"降智"的解决办法' in zh-TW", "ja": "Translation of 'Title: CPA反代Codex后"降智"的解决办法' in ja", "ko": "Translation of 'Title: CPA反代Codex后"降智"的解决办法' in ko", "de": "Translation of 'Title: CPA反代Codex后"降智"的解决办法' in de", "fr": "Translation of 'Title: CPA反代Codex后"降智"的解决办法' in fr", "it": "Translation of 'Title: CPA反代Codex后"降智"的解决办法' in it"} |
致命
[准确性]
""en": "Translation of 'Title: CPA反代Codex后"降智"的解决办法' in en"..."
理由: Hypothesis is metadata/template text rather than actual translation. Should provide actual translated content. | The hypothesis is a malformed JSON snippet containing translation prompts for multiple languages instead of the actual translation. | The translation is not provided and only includes a placeholder. |
| tencent/HY-MT1.5-7B | Title: Solutions to the 'Intelligence Reduction' Issue After CPA Reverse Proxy Codex |
严重
[准确性]
"'Intelligence Reduction'"
理由: Slightly different nuance from reference which uses 'downgrade/shadowban' | The hypothesis only translates the title and omits the content explaining the solution. | Misinterprets slang '降智' which in context refers to model downgrade/shadowban rather than actual intelligence reduction. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: The biggest benefit from this wave is Kimi |
严重
[准确性]
"The biggest benefit from this wave is Kimi"
理由: Missing context - should specify 'from this OpenClaw wave' as in reference | Omission of explicit reference to the "OpenClaw" wave slightly reduces specificity, though overall meaning remains close. | Missing context. Should be 'I feel like Kimi benefited the most from this OpenClaw wave' to clarify the subject and context. |
| CohereLabs/tiny-aya-water | Title: The biggest benefit from this wave is still Kimi | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: The biggest beneficiary of this round feels like it's still kimi Content: Driven by the Kimi K2.5 model and Kimi Claw, since early January, Kimi has achieved... |
轻微
[术语]
"kimi"
理由: Should clarify as 'Kimi Claw (OpenClaw)' for better context. | Should be capitalized 'Kimi' | Should capitalize as 'Kimi' to match product naming convention. | Incorrect translation of '小龙虾' as 'kimi' instead of 'OpenClaw'. |
| Qwen/Qwen2.5-7B-Instruct | Title: Little Lobster feels most benefited this time around is still Kimi Content: Driven by the Kimi K2.5 model and Kimi Claw, since early January, Kimi has set... |
严重
[术语]
"Little Lobster"
理由: Should be 'OpenClaw' not 'Little Lobster' - incorrect terminology | Incorrect translation of '小龙虾' as 'Little Lobster' instead of 'OpenClaw'. | “小龙虾”在语境中指 OpenClaw 项目;直译为“小龙虾/ Little Lobster”会被当作动物,而参考为“this OpenClaw wave”。 |
| Qwen/Qwen3-14B | I maggiori beneficiari di questa ondata di gamberetti sono probabilmente Kimi |
轻微
[准确性]
"gamberetti"
理由: Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | Severe omission. The hypothesis only translates the title, completely missing the content about Kimi K2.5 model and the growth since late January. | Again mistranslates 小龙虾 referring to OpenClaw ecosystem as literal shrimp; obscures the AI framework reference. | Accuracy: '小龙虾' (crayfish/lobster) is a nickname for OpenClaw, not literal crustaceans. The translation 'gamberetti' (small shrimp) is literal and loses the tech context. Should be 'OpenClaw'. |
| Qwen/Qwen3.5-4B | Title: The feeling that crayfish benefit the most from this move is still Kimi Content: Driven by the Kimi K2.5 model and Kimi Claw, since the end of January, Kimi has created... |
严重
[准确性]
"crayfish benefit the most from this move is still Kimi"
理由: Should be 'I feel like Kimi benefited the most from this OpenClaw wave'; awkward phrasing and missing context | Mis-parses sentence: subject is that Kimi benefits most from the OpenClaw wave, not that ‘crayfish’ benefit from Kimi; reverses roles and introduces “crayfish” not present in source. | Misinterpretation of the sentence structure and slang. The source means 'The one who benefited most from this OpenClaw wave is Kimi'. The hypothesis suggests the crayfish themselves benefit. | Reference says 'I feel like Kimi benefited the most from this OpenClaw wave' - different structure and terminology |
| Qwen/Qwen3.5-9B | Title: Crayfish seem to benefit most from this wave; I feel like Kimi is the biggest beneficiary Content: Driven by the Kimi K2.5 model and Kimi Claw, since late January, Kimi has created... |
严重
[准确性]
"Crayfish"
理由: Should be 'I feel like Kimi benefited the most from this OpenClaw wave' - '小龙虾' is slang for OpenClaw | Misinterpretation of syntax. The source means 'Regarding this OpenClaw wave, the biggest beneficiary is Kimi', not that the crayfish themselves are benefiting. | “小龙虾” refers to OpenClaw; literal “Crayfish” loses the product/brand meaning. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The biggest gain this time is still kimi |
严重
[准确性]
"The biggest gain this time is still kimi"
理由: Misses reference to 'this OpenClaw wave' and lacks clarity that Kimi is the beneficiary; wording is vague/ambiguous. | Incorrect: Missing context that this is about OpenClaw wave. Should be 'Kimi benefited the most from this OpenClaw wave' | Misses the subject 'OpenClaw wave' (小龙虾这波). The source says 'This OpenClaw wave, the biggest beneficiary feels like Kimi'. The hypothesis loses the context of what 'this time' refers to. | Should be 'I feel like Kimi benefited the most from this OpenClaw wave' - more accurate interpretation |
| google/gemma-3-12b-it | Title: It feels like Kimi benefited the most from this wave of Xiaolongxia. |
严重
[术语]
"this wave of Xiaolongxia"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Xiaolongxia' | Used Pinyin 'Xiaolongxia' instead of the project name 'OpenClaw'. | '这波' refers to the OpenClaw-related wave; using 'Xiaolongxia' (little lobster) instead of OpenClaw loses the concrete product reference. | Should be 'OpenClaw' instead of 'Xiaolongxia' | Should be 'OpenClaw' not 'Xiaolongxia' - incorrect terminology |
| google/gemma-3-1b-it | The feeling that is most beneficial for the largest number of people is Kimi K2.5 and Kimi Claw. |
严重
[准确性]
"The feeling that is most beneficial for the largest number of people is Kimi K2.5 and Kimi Claw."
理由: The hypothesis misinterprets the source. The source says 'The biggest beneficiary of this OpenClaw wave feels like Kimi'. The hypothesis changes the meaning to 'The feeling that is most beneficial for the largest number of people'. It also omits the second sentence about Kimi's achievements since late January. | Misinterprets the structure; original states Kimi benefits most from the OpenClaw wave driven by Kimi K2.5 and Kimi Claw, not that the 'feeling' benefits people. | Mistranslation of the title. Should be 'I feel like Kimi benefited the most from this OpenClaw wave' not about 'largest number of people'. Reference correctly captures the meaning. |
| google/gemma-3-4b-it | The feeling that benefited the most from this wave is kimi (Little Lobsters) | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: It feels like Kimi benefited the most from this wave of small lobsters. |
严重
[术语]
"this wave of small lobsters"
理由: Incorrect translation - should be 'OpenClaw' not 'small lobsters' | Failed to translate '小龙虾' as 'OpenClaw'. Literal translation obscures the meaning. | "这波" refers to the OpenClaw trend; rendering it as "wave of small lobsters" introduces an unnecessary literalism and obscures link to OpenClaw. | Should be 'OpenClaw' to match the reference. |
| google/translategemma-4b-it | Title: Small Shrimp - The biggest beneficiary seems to be kimi |
严重
[术语]
"Small Shrimp"
理由: Incorrect translation of '小龙虾' as 'Small Shrimp' instead of 'OpenClaw'. | Should be 'OpenClaw' not 'Small Shrimp' per reference. | “小龙虾” is OpenClaw; ‘Small Shrimp’ is an incorrect animal and loses the OpenClaw product reference. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: Complete mistranslation - source discusses Kimi benefiting from OpenClaw wave, hypothesis is 'Login' which is completely unrelated | Hypothesis is completely unrelated to source. Source discusses OpenClaw and Kimi benefits, not login. Incorrect translation. | The translation 'Login' does not accurately convey the specific context of the reference. | The hypothesis completely fails to translate the source text about Kimi and OpenClaw, outputting an unrelated single word. | Single unrelated English word; does not translate or reflect anything about Kimi benefiting from the OpenClaw wave. |
| tencent/HY-MT1.5-7B | "en": "Title: The One Who Really Benefited from the Little Lobster Campaign Is Probably Kimi", "zh-TW": "Title: The One Who Really Benefited from the Little Lobster Campaign Is Probably Kimi", "ja": "Title: The One Who Really Benefited from the Little Lobster Campaign Is Probably Kimi", "ko": "Title: The One Who Really Benefited from the Little Lobster Campaign Is Probably Kimi", "de":="Titel: Die Person, die wirklich am meisten von der Little Lobster-Kampagne profitiert hat, ist wahrscheinlich Kimi", "fr":="Titre: La personne qui a vraiment bénéficié le plus de la campagne Little Lobster est probablement Kimi", "it":="Titolo: La persona che ha realmente tratto il maggior beneficio dalla campagna Little Lobster è probabilmente Kimi"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Drawing】Quick 3 level reached, draw an award team slot*8 |
严重
[准确性]
"team slot*8"
理由: Source "team车位" refers specifically to GPT Team shared seats; "team slot" is vague and misses the GPT Team context. | Incorrect translation - 'team车位' should be 'GPT Team shared seats' not 'team slot' | Inaccurate translation. Should be '[Lucky Draw] Almost level 3, hosting a draw: GPT Team shared seats * 8'. 'team车位' refers to GPT Team shared seats, not generic 'team slot'. |
| CohereLabs/tiny-aya-water | Title: Win a prize by sharing your account and using our platform! |
致命
[准确性]
"Win a prize by sharing your account and using our platform!"
理由: Completely unrelated to source. Source is about a lucky draw for reaching level 3 with GPT Team shared seats as prizes, not about sharing accounts. | Hallucination. Completely unrelated to the source text which is about a lucky draw for team seats. Missing Content translation. | Completely fabricated - source is about lucky draw for GPT Team seats, hypothesis gives promotional message | The translation does not match the source content. | Completely fabricated call-to-action unrelated to level-3 draw and team seats; none of the source meaning is preserved. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Draw] About to level 3, let's draw some prizes! Team slots *8 Content: Prize details: [Prize]: [Team shared subscription slots (no warranty) *8] | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Draw] Level 3 now, draw a prize team slots*8 Content: Prize Details:[Prize]:[team slots (no warranty)*8] |
严重
[术语]
"team slots*8"
理由: Should be 'GPT Team shared seats' not 'team slots' - incorrect terminology | “team车位”在语境中是“GPT Team shared seats”;“slots”未传达与 GPT Team 相关的特指含义。 | Should be 'GPT Team shared seats' not 'team slots'. The reference clarifies this is GPT Team feature. |
| Qwen/Qwen3-14B | 【Sorteggio】Vicino al livello 3, vinci un posto team *8 |
严重
[准确性]
"posto team *8"
理由: Accuracy: 'team车位' refers to 'GPT Team shared seats' (team seats/slots), not a literal 'posto team' (team spot). The translation loses the specific meaning of shared seats in a team plan. | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | The translation misinterprets the source meaning. '快 3 级了抽个奖' means 'Almost reached level 3, hosting a lucky draw', but the translation says 'Near level 3, win a team spot', changing the subject from the host to the winner and omitting the action of hosting. Additionally, 'team 车位' (team parking spot/shared seat) is translated as 'posto team' which is vague and misses the specific gaming context implied by '车位'. The quantity '*8' is attached confusingly to the single prize description rather than indicating 8 units of the prize. | Omits key qualifier ‘GPT’ and ‘shared seats’ plus ‘no warranty’; loses important contextual details about the prize. |
| Qwen/Qwen3.5-4B | Title: [Lottery] Level 3+ - Claim your Team Slot *8 Content: Prize Details: [Prize]: [Team Slot (No Warranty) *8] |
严重
[准确性]
"Level 3+ - Claim your Team Slot *8"
理由: Reference says 'Almost level 3, hosting a draw' not 'Level 3+' | Mistranslation of 'team 车位'. In this context, '车位' (parking spot) is slang for a 'shared seat' or 'spot' in a shared account (GPT Team), not a literal 'slot' or parking space. Reference uses 'shared seats'. | Should be '[Lucky Draw] Almost level 3, hosting a draw: GPT Team shared seats * 8'; missing context about GPT Team | “快3级了抽个奖” means “almost level 3, holding a lottery”, not that level 3+ users can ‘claim’ slots; changes the nature of the event. |
| Qwen/Qwen3.5-9B | Title: [Giveaway] Almost level 3, enter for a prize! 8 shared subscription slots Content: Prize details: [Prize]: [8 shared subscription slots (no warranty)] |
严重
[准确性]
"8 shared subscription slots"
理由: “team车位” in this context specifically refers to GPT Team shared seats; generic “subscription slots” omits the GPT Team aspect. | Should be 'GPT Team shared seats' per reference | Should be 'GPT Team shared seats' to match reference - 'team车位' refers to GPT Team shared seats |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Lottery Level 3 Draw - Win a Prize! Team Slot x8 |
严重
[准确性]
"Lottery Level 3 Draw - Win a Prize!"
理由: '快3级了抽个奖' means 'Almost level 3, so I'll do a giveaway', not 'Level 3 draw'; subtle meaning shift. | Incorrect: '快3级了' means 'almost level 3', not 'Lottery Level 3' | Misinterprets '快 3 级了' (Almost level 3) as 'Level 3 Draw'. The user is hosting a draw because they are approaching level 3, not because it is a 'Level 3 Draw'. |
| google/gemma-3-12b-it | "en": { "Title": "[Raffle] Level 3 - Raffle for Team Slots x8", "Content": "Prize Details: [Prize]: [Team Slot (no warranty) x8]" }, "zh-TW": { "Title": "[抽獎] 快3級了抽個獎 團隊車位*8", "Content": "獎品詳情:[獎品]:[團隊車位(無質保)*8]" }, "ja": { "Title": "[抽選] レベル3到達!チーム枠を抽選します x8", "Content": "賞品詳細:[賞品]:[チーム枠(保証なし)x8]" }, "ko": { "Title": "[추첨] 레벨 3 달성! 팀 자리 추첨 x8", "Content": "상품 상세 정보: [상품]: [팀 자리 (보증 없음) x8]" }, "de": { "Title": "[Verlosung] Level 3 – Verlose Team Slots x8", "Content": "Preisdetails: [Preis]: [Team Slot (keine Garantie) x8]" }, "fr": { "Title": "[Tirage au sort] Niveau 3 – Tirage pour des places d’équipe x8", "Content": "Détails du prix : [Prix] : [Place d’équipe (sans garantie) x8]" }, "it": { "Title": "[Lotteria] Livello 3 – Lotteria per posti squadra x8", "Content": "Dettagli del premio: [Premio]: [Posto squadra (senza garanzia) x8]" } } <|im_end|> |
严重
[术语]
"Team Slots"
理由: Context suggests 'GPT Team shared seats' rather than generic 'Team Slots'. | Should be 'GPT Team shared seats' not 'Team Slots' | Source and reference specify 'GPT Team shared seats'; generic 'Team Slots' drops the GPT Team context and shared-seat nuance. |
| google/gemma-3-1b-it | Quick 3rd level draw - prize team seat (no guarantee) |
严重
[准确性]
"Quick 3rd level draw"
理由: The hypothesis mistranslates '快 3 级了' (Almost level 3) as 'Quick 3rd level'. It also omits the quantity '*8' in the title and the specific prize name 'GPT Team shared seats' (translated vaguely as 'team seat'). | '快3级了' means 'almost level 3', not 'quick 3rd level'. | Incomplete and inaccurate translation. Should be 'Almost level 3, hosting a draw: GPT Team shared seats * 8' and 'team seat' should be 'GPT Team shared seats'. |
| google/gemma-3-4b-it | "en": "Title: [Sweepstakes] Level 3 Raffle - Team Slot *8\nContent: Prize Details: [Prize]: [Team Slot (No Warranty) *8]", "zh-TW": "標題:【抽獎】快3級了抽個獎 隊伍車位*8\n內容:獎品詳情:[獎品]:[隊伍車位(無保固)*8]", "ja": "タイトル:[くじ引き] レベル3くじ抽選 - チームスロット*8\nコンテンツ:賞品詳細:[賞品]:[チームスロット(保証なし)*8]", "ko": "제목: [당첨 이벤트] 레벨 3 당첨 추첨 - 팀 슬롯\*8\n내용: 상품 상세: [상품] : [팀 슬롯(보증 없음)\*8]", "de": "Titel: [Verlosung] Level 3 Raffle – Team-Slot *8\nInhalt: Gewinnerdetails: [Preis]: [Team-Slot (keine Garantie) *8]", "fr": "Titre : [Jeu concours] Niveau 3 tirage au sort - Place d’équipe \*8\nDétails du prix : [Prix] : [Place d’équipe (sans garantie) \*8]", "it": "Titolo: [Concorso] Livello 3 lotteria - Posto squadra *8\nDettagli del premio: [Premio]: [Posto squadra (senza garanzia) *8]" } <|im_end|> | 通过 (无共识错误) |
| google/translategemma-12b-it | 【Raffle】Almost Level 3! Raffle for 8 Team Slots |
严重
[准确性]
"Team Slots"
理由: Should be '[Lucky Draw] Almost level 3, hosting a draw: GPT Team shared seats * 8' not just 'Team Slots' | Source "team车位" and reference specify "GPT Team shared seats"; "Team Slots" loses the GPT/team-seat specificity, which may be important context. | Reference says 'Almost level 3, hosting a draw: GPT Team shared seats * 8' not 'Almost Level 3! Raffle for 8 Team Slots' | The hypothesis omits the content section entirely. Also, 'team 车位' is specific slang for 'shared team seats/slots', 'Team Slots' is acceptable but 'shared seats' is more precise. |
| google/translategemma-4b-it | Title: 【Giveaway】Level 3 Giveaway - 8 Team Slots |
严重
[准确性]
"【Giveaway】Level 3 Giveaway - 8 Team Slots"
理由: Source and reference specify ‘team车位’ as GPT Team shared seats with no warranty. “Team Slots” omits GPT, the ‘shared seats’ nuance, and the ‘no warranty’ condition. | Inaccurate translation. '车位' means 'seats' or 'shared seats'. Should be 'GPT Team shared seats' not 'Team Slots'. | Should be 'Almost level 3' not 'Level 3', and 'GPT Team shared seats' not 'Team Slots' per reference. |
| tencent/HY-MT1.5-1.8B | "en": "Title: 【Lotto】It's almost level 3, take a prize - team parking spaces * 8", "zh-TW": "標題:【抽獎】快3級了,抽個獎 - 團隊停車位*8", "ja": "タイトル:【ロト】3レベルに近い、賞を引く - チームの駐車スペース*8", "ko": "제목: 【로토】3레벨에 가까워서 상을 뽑아요 - 팀 주차 공간 * 8", "de": "제목: 【Lotto】Es ist fast Level 3, einen Preis gewinnen - Team-Parkplätze * 8", "fr": "제목: 【Lotto】Il est presque niveau 3, gagner un prix - Places de stationnement du groupe * 8", "it": "제목: 【로토】이미 레벨 3에 가까워서 상을 받으세요 - 팀 주차 공간 * 8"} |
致命
[准确性]
""en": "Title: 【Lotto】It's almost level 3, take a prize - team parking spaces * 8"..."
理由: Hypothesis provides multiple language translations when only English is required. Also 'team parking spaces' is inaccurate; should be 'GPT Team shared seats' per reference. | The hypothesis is a malformed JSON object containing translations in multiple languages (some incorrect, e.g., German using Korean characters) instead of a single English translation. | The translation is not provided and only includes a placeholder. |
| tencent/HY-MT1.5-7B | Title: [Lottery] Nearly at Level 3? Get a Prize: team车位*8 |
严重
[准确性]
"team车位*8"
理由: Failed to translate the slang '车位' (parking spot) which metaphorically means 'shared seat' or 'slot' in a team subscription. Leaving it in Chinese makes the translation incomplete for an English target. | Incorrect: 'team车位' was kept as-is; reference translates as 'GPT Team shared seats'. Also '快3级了' should be 'Almost level 3' not 'Nearly at Level 3?' | Should be 'Almost level 3, hosting a draw: GPT Team shared seats * 8' - missing context about GPT Team and shared seats | Leaves key term untranslated; should indicate 'GPT Team seats/slots * 8' or similar to convey prize meaning. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Why can't I use the 7-day API provided by minimax? My paid API can be used. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Why can't I use the 7-day API provided by minimax? My paid API can be used, but this one doesn't work. | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Why can't I use the 7-day API provided by minimax with lobster attached, but my own paid one works? Content: As a beginner, I wanted to get some freebies but failed. The API I paid one dollar for works though. Could someone help me check this error report? |
严重
[术语]
"with lobster attached"
理由: Awkward and literal translation. Should be 'connected to OpenClaw'. | Incorrect translation - should be 'connected to OpenClaw' not 'with lobster attached' | Should be 'OpenClaw' not 'lobster'. While 'lobster' is slang, the proper product name is 'OpenClaw'. | '龙虾' is the OpenClaw system; phrase should be 'when connected to OpenClaw' instead of literal 'lobster' which obscures the tool being discussed. |
| Qwen/Qwen2.5-7B-Instruct | Title: Why can't I use the 7-day API from minimax when connected to LinuxDo, but my paid one works? Content: Newbie tried to exploit, failed; a one yuan API actually works. Please help me check, below is the error. |
严重
[准确性]
"when connected to LinuxDo"
理由: Should be 'when connected to OpenClaw' not 'LinuxDo'. LinuxDo is the platform, OpenClaw is the product. | 原文“接上龙虾”是接入 OpenClaw;LinuxDo 是论坛,与此不同,属实体误译。 | Mistranslation of '接上龙虾'. '龙虾' refers to OpenClaw, not the forum 'LinuxDo'.
轻微
[风格]
"tried to exploit"
理由: “薅羊毛”是“take advantage of freebies/score some freebies”,用“exploit”语气略重,偏负面。 | Should be 'As a newbie, I wanted to fleece some freebies but failed' to match reference tone | '薅羊毛' is better translated as 'fleece freebies' or 'get freebies' in this context; 'exploit' sounds too malicious. |
| Qwen/Qwen3-14B | Perché non posso utilizzare l'API da 7 giorni fornita da Minimax con Lobster, mentre quella che ho pagato io funziona? |
轻微
[准确性]
"Lobster"
理由: Severe omission. The hypothesis only translates the title, completely missing the content where the user explains they are a newbie trying to get freebies, the 1 RMB top-up, and the request for help with the error message. | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | 小龙虾 here is OpenClaw product name; translating as literal ‘Lobster’ loses the product/brand reference. | Accuracy: '小龙虾' (crayfish/lobster) is a nickname for OpenClaw, not actual lobster. The translation 'Lobster' is literal and loses the tech context. Should be 'OpenClaw'. |
| Qwen/Qwen3.5-4B | Title: Why can't I use the 7-day API connection sent by minimax with lobster, but my own paid one works? Content: A beginner tried to get freebies and failed; however, a $1 API credit worked. Please help me check the error below. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Why can't I use Minimax's free 7-day API with Loxia? My own paid credits work fine. Content: A beginner tried to get a free ride but failed; one yuan of API credit works, please help me check the error below. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Why can't I use the 7-day API sent by minimax with shrimp (dragon虾), but my own money-charged API works? |
严重
[术语]
"with shrimp (dragon虾)"
理由: Should be 'with OpenClaw' not 'with shrimp'. '龙虾' is slang for OpenClaw, not literal shrimp. | Incorrect: '小龙虾' is slang for 'OpenClaw', should be 'OpenClaw' not 'shrimp' | Confusing mixture of English and Chinese ('dragon虾'); should consistently use 'OpenClaw' or the correct product name instead of literal shrimp. | Confused and incorrect translation of '龙虾'. It should be 'OpenClaw'. The addition of '(dragon 虾)' is hallucinated and nonsensical. |
| google/gemma-3-12b-it | Title: Why does the 7-day API provided by minimax not work with Lobster, but my own paid API works? |
严重
[术语]
"Lobster"
理由: Again, using 'Lobster' instead of the framework name 'OpenClaw' makes it unclear which system the MiniMax API is connected to. | Incorrectly translated '龙虾' as 'Lobster' instead of 'OpenClaw'. | Incorrect translation of '龙虾' - should be 'OpenClaw' not 'Lobster' | Should be 'OpenClaw' not 'Lobster' |
| google/gemma-3-1b-it | Why did Minimax send a 7-day API to get shrimp? It doesn't work, and I spent my own money but can use it. |
严重
[准确性]
"Why did Minimax send a 7-day API to get shrimp? It doesn't work, and I spent my own money but can use it."
理由: Misrenders '接上龙虾用不了' (doesn't work when connected to OpenClaw) and literal 'get shrimp' for OpenClaw; confuses which API works. | The phrase 'to get shrimp' is a mistranslation of '接上龙虾' (connected to OpenClaw/Little Shrimp). The user didn't send an API to get shrimp; they tried to use the free API with OpenClaw. The hypothesis also omits the request for help and the mention of the error message. | Mistranslation. Should be 'Why doesn't the 7-day free API from MiniMax work when connected to OpenClaw, but the one I paid for does?' The hypothesis misses the context about connecting to OpenClaw. |
| google/gemma-3-4b-it | Why does the minimax 7-day API stop working when using lobster? I paid for it myself, but it works. |
严重
[术语]
"when using lobster"
理由: “接上龙虾” means connecting to OpenClaw; literal “lobster” loses the established product name and may confuse readers. | Incorrect translation - 'lobster' should be 'OpenClaw' (小龙虾 is a nickname for OpenClaw) | Should be 'OpenClaw' not 'lobster'. Reference correctly uses 'OpenClaw'. | Translates the slang '龙虾' literally as 'lobster' instead of the tool name 'OpenClaw', leading to confusion. |
| google/translategemma-12b-it | Title: Why does the 7-day API provided by minimax not work with Lobster, while my own paid API works? |
严重
[术语]
"Lobster"
理由: Failed to translate '龙虾' as 'OpenClaw'. | Incorrect translation - should be 'OpenClaw' not 'Lobster' | As with other items, "龙虾" is the OpenClaw project; using "Lobster" alone omits the OpenClaw name and weakens clarity of reference. |
| google/translategemma-4b-it | Why does the 7-day API from minimax not work after connecting to lobster, while my own paid API works? |
严重
[术语]
"lobster"
理由: Should be 'OpenClaw' not 'lobster' per reference. | Refers to OpenClaw; using generic ‘lobster’ instead of the product name loses the intended meaning. | Should be 'OpenClaw' not 'lobster'. While '小龙虾' literally translates to small lobster, in this community context it refers to the OpenClaw product. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '为什么minimax送的7天API接上龙虾用不了,我自己的花钱充的却能用' in en", "zh-TW": "翻译 '为什么minimax送的7天API接上龙虾用不了,我自己的花钱充的却能用' 在中文繁体", "ja": "翻译 '为什么minimax送的7天API接上龙虾用不了,我自己的花钱充的却能用' 在日语中", "ko": "翻译 '为什么minimax送的7天API接上龙虾用不了,我自己的花钱充的却能用' 在韩语中", "de": "翻译 '为什么minimax送的7天API接上龙虾用不了,我自己的花钱充的却能用' 在德语中", "fr": "翻译 '为什么minimax送的7天API接上龙虾用不了,我自己的花钱充的却能用' 在法语中", "it": "翻译 '为什么minimax送的7天API接上龙虾用不了,我自己的花钱充的却能用' 在意大利语中" }, "error": "" } |
致命
[准确性]
""en": "Translation of '为什么minimax送的7天API接上龙虾用不了,我自己的花钱充的却能用' in en"..."
理由: The translation is not provided and only includes a placeholder. | The hypothesis is a malformed JSON object containing meta-instructions in multiple languages instead of the actual translation. | Hypothesis is metadata/template text with multiple language labels rather than actual translation. Should provide actual translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: Why can’t I use the 7-day API provided by minimax when it’s connected to ‘LinuxDo’, but my own paid API works?", "zh-TW": "标题:為何將minimax提供的7天API連接到‘LinuxDo’後無法使用,而我自己支付的API卻可以使用?", "ja": "タイトル:なぜminimaxが提供する7日間のAPIを‘LinuxDo’に接続しても使用できないのに、自分で支払ったAPIは使えるのか?", "ko": "제목: 왜 minimax가 제공하는 7일간의 API를 ‘LinuxDo’에 연결해도 사용할 수 없는데, 직접 돈을 내고 구입한 API는 사용할 수 있는 걸까요?", "de":="Titel: Warum kann ich die 7-tägige API, die von minimax bereitgestellt wird, nicht mit ‘LinuxDo’ verwenden, obwohl meine eigene bezahlte API funktioniert?", "fr":="Titre : Pourquoi ne peux-je pas utiliser l’API de 7 jours fournie par minimax lorsque je la connecte à ‘LinuxDo’, alors que mon propre API payant fonctionne ?", "it":="Titolo: Perché non riesco a utilizzare l’API di 7 giorni fornita da minimax quando la collego a ‘LinuxDo’, mentre la mia API pagata funziona?"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: anyrouter 接了 cpa 给小龙虾,发现 opus 好像 70% 的请求都是缓存写,额度烧的飞快。 |
致命
[准确性]
"anyrouter 接了 cpa 给小龙虾,发现 opus 好像 70% 的请求都是缓存写,额度烧的飞快。"
理由: No translation performed. The output is identical to the source Chinese text. | No translation - hypothesis is in Chinese (same as source), but reference is in English | No translation provided. The hypothesis is identical to the source in Chinese. Should be translated to English. |
| CohereLabs/tiny-aya-water | Title: How to deal with ultra-low cache hit rate for opus when connecting anyrouter to cuaipiao (cpa) for small shrimp? |
严重
[准确性]
"cuaipiao (cpa)"
理由: "cuaipiao" is hallucinated; source only has "cpa". This introduces an incorrect term. | Hallucination/Mistranslation. 'cpa' is kept but 'cuaipiao' is invented/incorrect. | Should be 'CPA' not 'cuaipiao (cpa)'. 'cuaipiao' is not a correct transliteration or term.
严重
[术语]
"for small shrimp"
理由: Incorrect translation - '小龙虾' refers to OpenClaw, not 'small shrimp' | Should be 'OpenClaw' not 'small shrimp'. '小龙虾' is slang for OpenClaw. | "小龙虾" refers to OpenClaw, not literal small shrimp. |
| Qwen/Qwen2.5-14B-Instruct | Title: How to deal with extremely low cache hit rate when integrating Opus with AnyRouter for crayfish? Content: After integrating AnyRouter with CPA for crayfish, I found that about 70% of requests from Opus seem to be cache writes, burning through the quota very quickly. |
严重
[术语]
"crayfish"
理由: Should be 'OpenClaw' not 'crayfish'. The proper product name is 'OpenClaw'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'crayfish' | '小龙虾' is the OpenClaw product; translating as 'crayfish' loses the specific product meaning. | Failed to translate the slang '小龙虾' to 'OpenClaw', using the literal biological term 'crayfish' instead. |
| Qwen/Qwen2.5-7B-Instruct | Title: How to Handle Low Quota Usage for Opus Cache Hits on anyrouter's opus? Content: After connecting anyrouter to cpa and giving it to the shrimp, I noticed that about 70% of opus requests seem to be cache writes, burning through the quota quickly. |
严重
[准确性]
"How to Handle Low Quota Usage for Opus Cache Hits"
理由: Contradicts the source. Source says '缓存命中超低' (cache hit rate is super low), which causes high quota usage. Hypothesis title implies low usage is the problem. | 原题为“缓存命中超低怎么整?”,核心是命中率低而不是“配额使用低”;这里误解为“quota usage low”。 | Should be 'The cache hit rate for Anyrouter's Opus connected to OpenClaw is super low, how to fix this?' - misunderstands the problem
严重
[术语]
"giving it to the shrimp"
理由: Should be 'for OpenClaw' not 'give it to the shrimp' - incorrect terminology | Should be 'I connected Anyrouter to CPA for OpenClaw' - 'giving it to the shrimp' is awkward | Incorrect translation of '小龙虾' as 'shrimp' instead of 'OpenClaw'. | “给小龙虾”指接给 OpenClaw 使用;直译“shrimp”导致歧义,未体现项目名。 |
| Qwen/Qwen3-14B | Come gestire un tasso di colpo del cache estremamente basso quando si collega l'opus di anyrouter alle gamberi? |
轻微
[准确性]
"alle gamberi"
理由: Accuracy: '小龙虾' (crayfish/lobster) is a nickname for OpenClaw, not actual crustaceans. The translation 'gamberi' (shrimp/prawns) is literal and loses the tech context. Should be 'OpenClaw'. | Translation is in Italian instead of target language Chinese (zh). The hypothesis should be in Chinese, not Italian. | Uses literal ‘gamberi’ (shrimp) instead of product name OpenClaw; confuses meaning. | Severe omission. The hypothesis only translates the title, completely missing the content about connecting to CPA, 70% cache writes, and burning through quota. |
| Qwen/Qwen3.5-4B | Title: How to optimize ultra-low cache hit rate for Opus when connecting AnyRouter to Xiaolongxia? Content: AnyRouter is connected to CPA via Xiaolongxia, and I noticed that about 70% of requests seem to be cache writes, causing quota consumption to skyrocket. |
严重
[准确性]
"Xiaolongxia"
理由: Should be 'The cache hit rate for Anyrouter's Opus connected to OpenClaw is super low, how to fix this'; '小龙虾' is OpenClaw | Leaves the Chinese nickname untranslated and does not indicate it is OpenClaw; reference explicitly mentions OpenClaw. | Reference title is 'The cache hit rate for Anyrouter's Opus connected to OpenClaw is super low, how to fix this?' - different structure
严重
[术语]
"Xiaolongxia"
理由: Should be 'OpenClaw' not 'Xiaolongxia' (the Chinese term) as per reference | Transliteration of '小龙虾' as 'Xiaolongxia' instead of using the recognized English name 'OpenClaw'. | Should be 'OpenClaw' instead of 'Xiaolongxia'. |
| Qwen/Qwen3.5-9B | Title: How to handle extremely low cache hit rates for opus when connecting anyrouter's shrimp? Content: After anyrouter connected CPA to shrimp, it was found that about 70% of opus requests were cache writes, burning through quotas very quickly. |
严重
[准确性]
"anyrouter's shrimp"
理由: Should be 'The cache hit rate for Anyrouter's Opus connected to OpenClaw is super low, how to fix this?' - '小龙虾' is slang for OpenClaw | Confusing phrasing due to wrong terminology. Should be 'connecting Anyrouter to OpenClaw'. | “小龙虾” refers to OpenClaw; “shrimp” is an incorrect literal rendering. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | anyrouter 接了 cpa 给小龙虾,发现 opus 好像 70% 的请求都是缓存写,额度烧的飞快。 | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Extremely low cache hit rate for AnyRouter's Opus when connecting to Xiaolongxia Cache - What should I do? |
严重
[术语]
"Xiaolongxia Cache"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'Xiaolongxia' | Should be 'OpenClaw' not 'Xiaolongxia Cache' - incorrect terminology | Incorrectly translated '小龙虾' as 'Xiaolongxia' and incorrectly appended 'Cache' which was not in the source. | Should be 'OpenClaw' instead of 'Xiaolongxia' |
| google/gemma-3-1b-it | What should be done when anyrouter's Opus connection cache hit is found to be 70% cached requests? | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: How to deal with extremely low small lobster cache hit when using anyrouter's opus? |
严重
[术语]
"small lobster"
理由: Incorrect translation - 'small lobster' should be 'OpenClaw' (小龙虾 is a nickname for OpenClaw) | “小龙虾” is nickname for OpenClaw; literal “small lobster” obscures the product reference in a technical networking context. | Should be 'OpenClaw' not 'small lobster'. Reference correctly uses 'OpenClaw'. | Incorrectly translates the slang '小龙虾' as 'small lobster' instead of the project name 'OpenClaw'. |
| google/translategemma-12b-it | Title: Why is the cache hit rate for small lobster caching via anyrouter's Opus so low? Content: When using anyrouter to connect CPA to Xiaolongxia (small lobster), I found that about 70% of requests seem to be cache writes, and the quota is being consumed very quickly. |
严重
[术语]
"small lobster"
理由: Incorrect translation - should be 'OpenClaw' not 'small lobster' | Failed to translate '小龙虾' as 'OpenClaw'. | "小龙虾" is the nickname for OpenClaw; translating literally here instead of using "OpenClaw" obscures the product being discussed. |
| google/translategemma-4b-it | Title: How to fix the extremely low cache hit rate for small lobster when using anyrouter's opus? |
严重
[术语]
"small lobster"
理由: Should be 'OpenClaw' not 'small lobster' per reference. | Should be 'OpenClaw' not 'small lobster'. While '小龙虾' is the literal translation, the product name OpenClaw should be used in technical context. | Again, should refer to OpenClaw; “small lobster” loses the product name and is inconsistent with the community usage. |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾' in English |
致命
[准确性]
"Translation of '龙虾' in English"
理由: The hypothesis outputs a meta-instruction string instead of translating the source text about Anyrouter and cache hit rates. | Hypothesis is a placeholder template rather than actual translation; missing title and content translation | Hypothesis is metadata/template text rather than actual translation. Should provide actual translated title and content, not just a single word translation. | Only states that it is a translation of the word, but does not actually translate or capture the OpenClaw-specific context in the title and content. |
| tencent/HY-MT1.5-7B | "en": "Title: What to do if the cache hit rate for anyrouter’s opus when connecting to小龙虾 is extremely low?", "zh-TW": "标题:当anyrouter连接小龙虾时,如果opus的缓存命中率极低该怎么办?", "ja": "タイトル:anyrouterが小龙虾に接続した際にopusのキャッシュヒット率が非常に低い場合、どう対処すればよいか?", "ko": "제목: anyrouter가小龙虾에 연결했을 때opus의 캐시 히트율이 매우 낮은 경우 어떻게 해야 할까?", "de":="Titel: Was tun, wenn die Cache-Hit-Rate von anyrouter bei der Verbindung zu小龙虾 extrem niedrig ist?", "fr":="Titre : Que faire si le taux de réussite du cache d’opus pour anyrouter lors de la connexion à小龙虾 est extrêmement bas ?", "it":="Titolo: Cosa fare se il tasso di successo del cache di opus per anyrouter durante la connessione a小龙虾 è estremamente basso?"}|im_end|> |
严重
[准确性]
"What to do if the cache hit rate for anyrouter's opus when connecting to小龙虾 is extremely low?"
理由: Failed to translate the slang term '小龙虾' (OpenClaw). It is left in Chinese characters within an English sentence. | Incorrect: '小龙虾' was kept in Chinese characters instead of being translated as 'OpenClaw' in English version | Hypothesis contains multiple language versions and mixed Chinese/English. Should be 'The cache hit rate for Anyrouter's Opus connected to OpenClaw is super low, how to fix this?' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Codex feels like it's not going to last much longer before following antigravity and cc |
严重
[准确性]
"antigravity and cc"
理由: The hypothesis translates only the title and omits the entire content paragraph. Additionally, 'antigravity' and 'cc' are left untranslated/unexplained, whereas the reference clarifies them as specific services (Antigravity and Claude Code). | Mistranslation: 'codex' should be 'Codex Desktop' or 'Codex', but the translation 'Codex feels like it's not going to last much longer' is acceptable. However, 'antigravity' should be 'Antigravity (free API site)', and 'cc' should be 'Claude Code'. The translation is incomplete - missing the Content translation entirely. | Should be 'Antigravity and Claude Code' not just 'cc'. The reference clarifies 'cc' refers to Claude Code, not just an abbreviation. | The translation is slightly less concise than the reference and omits the word 'fate'. |
| CohereLabs/tiny-aya-water | Title: Codex feels like it won't last much longer before following antigravity and cc |
轻微
[准确性]
"Title: Codex feels like it won't last much longer before following antigravity and cc"
理由: The translation uses 'feels' instead of 'feel', which is less direct. | Does not mention the constraint about doubled quota before April 2 or the nuance of tightening limits, though core idea of Codex not lasting long is preserved. | Should be 'Antigravity and Claude Code' or similar. 'cc' is unclear abbreviation; reference shows 'Claude Code' | The hypothesis only translates the title but completely omits the entire content paragraph provided in the source. Additionally, 'antigravity' and 'cc' are left untranslated/unexplained, whereas the reference clarifies them as specific services (Antigravity, Claude Code). |
| Qwen/Qwen2.5-14B-Instruct | Title: Feeling that Codex won't last long either and will follow antigravity and cc Content: Tried out Codex Desktop today, and upon entering, it showed double credits for before version 4.2. It seems they'll start tightening up around 4.2 or slightly earlier. The era of hard riders is coming to an end, right? In the future, I don't think there will be many ways to get free plus and team versions. |
严重
[准确性]
"before version 4.2"
理由: Mistranslation of '4.2' which refers to April 2nd (date), not a software version number. | Reference says 'before April 2' (4月2日), not 'before version 4.2' - this is a mistranslation of the date | Source means "before April 2 (4.2 as a date)", not software version 4.2.
轻微
[准确性]
"free plus and team versions"
理由: Missed the slang '海鲜' (Xianyu/second-hand market) which implies the method of obtaining these versions, though the core meaning is partially retained. | Should be something like "free Plus and Team via loopholes" to reflect "搞免费plus和team" and the grey-hat nuance seen in reference. | Reference says '海鲜' (Xianyu, a platform name) but hypothesis translates as 'I don't think there will be many ways' - missing the platform reference |
| Qwen/Qwen2.5-7B-Instruct | Title: codex feels like it won't be long before it follows antigravity and cc Content: Played around with codex desktop today, and it immediately prompted double points for those before version 4.2. It seems that from around 4.2 or earlier, things will start to tighten up. Guys, the era of rapid growth is probably coming to an end. In the future, there might not be much room for free plus and team deals on seafood. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Codex sembra che non durerà molto prima di seguire antigravity e cc |
严重
[准确性]
"Codex sembra che non durerà molto prima di seguire antigravity e cc"
理由: Translation is in Italian instead of Chinese (target language is zh). The hypothesis should be in Chinese, not Italian. | The hypothesis only translates the title and completely omits the entire content paragraph regarding Codex Desktop, quota limits, and Xianyu loopholes. | The term 'cc' is not explicitly defined, which could be confusing. | Only translates the title and omits all content about doubled quota before 4.2, tightening, freeriding ending, and Xianyu loopholes for free Plus/Team. |
| Qwen/Qwen3.5-4B | Title: The Codex feels like it won't last long before following antigravity and cc. |
严重
[准确性]
"Title: The Codex feels like it won't last long before following antigravity and cc."
理由: The translation does not accurately reflect the reference's mention of 'Claude Code' and the fate aspect. | Translation is incomplete - missing the Content part of the source. Also 'antigravity' should be 'Antigravity' (proper noun) and 'cc' should be 'Claude Code' or 'CC' for consistency with reference. | Should be 'antigravity and Claude Code' or similar; 'cc' is unclear abbreviation, reference shows 'Claude Code' | Omission of key source content about doubled quota before 4.2, expected tightening, freeloading ending, and Xianyu loopholes for free Plus/Team; only the title is translated and content is entirely missing. | The hypothesis only translates the title, completely omitting the entire content paragraph which contains critical context about quota tightening and the end of free access methods. |
| Qwen/Qwen3.5-9B | Title: It feels like Codex won't last long before following Antigravity and cc. Content: I played around with Codex Desktop today; upon entering, it prompted that balances were doubled before version 4.2. It seems tightening will start by or before 4.2. The veterans say the rush is ending soon. In the future, there might not be many ways to get free Plus and Team access. |
严重
[准确性]
"balances were doubled before version 4.2"
理由: Source says the quota is doubled before April 2 (a date), not software version 4.2. | Mistranslation of '4.2' as a software version instead of the date April 2nd. Mistranslation of '额度' (quota) as 'balances'. | Should be 'quota' not 'balances'; reference uses 'quota'
轻微
[准确性]
"there might not be many ways to get free Plus and Team access"
理由: Source mentions "海鲜" (Xianyu, a second-hand marketplace) as the channel; this marketplace context is omitted. | Omission of '海鲜' (Xianyu/second-hand market), which specifies the source of the free accounts in the original text. | Missing context about 'Xianyu loopholes'; should reference 'loopholes on Xianyu' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: It won't take long before Codex feels like it needs to follow Antigravity and cc. |
轻微
[准确性]
"It won't take long before Codex feels like it needs to follow Antigravity and cc."
理由: Mistranslation of the source meaning. The source '感觉玩不了多久也要追随' means 'I feel it won't last long before it follows...', whereas the hypothesis says 'Codex feels like it needs to follow', attributing the feeling to the software itself rather than the user. Also, 'cc' is left untranslated/ambiguous compared to the reference which clarifies it as 'Claude Code'. | The hypothesis misses the key meaning of '感觉玩不了多久也要追随' which implies Codex will soon follow the fate/path of Antigravity and cc (likely discontinuation). The translation should convey inevitability and fate, not just a feeling about following. | The translation should use 'I feel like' instead of 'Codex feels like' and 'Claude Code' instead of 'cc'. | Misses nuance of “玩不了多久” (won’t last long to play with) and “也要追随…和cc了” (meet the same fate as), though overall sentiment is preserved. |
| google/gemma-3-12b-it | Title: codex feels like it won't be playable for long, also have to follow antigravity and cc |
严重
[准确性]
"codex feels like it won't be playable for long, also have to follow antigravity and cc"
理由: The translation should be more accurate to the source, e.g., 'Codex won't last long and will soon follow the fate of Antigravity and Claude Code'. | Misinterprets "玩不了多久" (won’t last long) as "won’t be playable" (about playability) and omits content about quota tightening, double quota before 4.2, and ending of free Plus/Team routes via Xianyu. | The translation fails to interpret slang terms. 'codex', 'antigravity', and 'cc' are specific product nicknames (Codex, Antigravity, Claude Code) that should be capitalized or translated contextually. 'follow' misses the nuance of 'sharing the same fate/shutting down'. The content is missing entirely. | Incomplete translation. Should convey 'will soon follow the fate of' rather than just 'follow'. The hypothesis misses the implication that these services will shut down like Antigravity and Claude Code did. |
| google/gemma-3-1b-it | Codex feels like I can't play for a while now, and Antigravity and cc will be following soon. |
严重
[准确性]
"Codex feels like I can't play for a while now,"
理由: Complete mistranslation of meaning. The source says Codex will follow Antigravity and cc (in terms of restrictions/closure), but the hypothesis says Antigravity and cc will follow Codex. Also misses the context of 'playing' referring to using the service freely. | The translation does not fully capture the source's meaning about the tightening of quotas and the end of freeriding. | Incomplete translation - only translates the title, missing the entire content section. The hypothesis does not translate the full source text. | Misinterprets “玩不了多久” which means it won’t last long, not that the speaker personally can’t play for a while. |
| google/gemma-3-4b-it | Title: codex feels like I’ll have to follow along with antigravity and cc soon. Content: I tried playing codex desktop briefly today, and immediately received a prompt about double credit before 4.2, feeling that 4.2 or slightly earlier will start tightening up. The veterans are stopping their intense pedaling, so there won't be any free plus or team for seafood later. |
严重
[准确性]
"Title: codex feels like I’ll have to follow along with antigravity and cc soon."
理由: Mistranslates '追随' context; should convey 'will soon follow the fate of' rather than 'follow along with' | Mistranslation of subject: The source says 'codex' will follow 'antigravity and cc', but the hypothesis says 'I' will follow them. Also missed the nuance of 'won't last long'. | Misinterprets “玩不了多久” (won’t last long) as having to “follow along with” others, and omits the idea of Codex shutting down like Antigravity and CC.
严重
[准确性]
"double credit before 4.2"
理由: Should be 'quota doubled' instead of 'double credit'. | Reference says 'quota is doubled' not 'double credit' | Source implies quota/usage limit doubled, not clearly just “credit”; date format also ambiguous vs April 2.
严重
[准确性]
"The veterans are stopping their intense pedaling"
理由: Reference says 'The bros' heavy freeriding is coming to an end' - '猛蹬' is slang for 'freeriding/abusing' not literal pedaling | Literal mistranslation of slang '猛蹬' (freeriding/exploiting loopholes). 'Pedaling' makes no sense in this context. | “猛蹬要结束了” refers to heavy freeriding/exploiting quota, not literally pedaling; meaning is distorted. | Incorrect interpretation of '佬们猛蹬要结束了'; should be 'The bros' heavy freeriding is coming to an end' not 'veterans stopping pedaling'
严重
[准确性]
"there won't be any free plus or team for seafood later"
理由: Literal mistranslation of '海鲜' (Xianyu), which is a second-hand trading platform name, not actual seafood. | Mistranslates '海鲜' and context; should be 'there probably won't be any loopholes on Xianyu to get free Plus and Team' | “海鲜” here is slang for Xianyu (a marketplace), not “seafood”; changes meaning and becomes nonsensical. |
| google/translategemma-12b-it | Title: codex - It doesn't feel like I'll be able to use it for much longer, and I guess I'll have to follow antigravity and cc soon. |
严重
[准确性]
"Title: codex - It doesn't feel like I'll be able to use it for much longer, and I guess I'll have to follow antigravity and cc soon."
理由: The hypothesis translates '龙虾' (lobster/OpenClaw) as 'codex' and misses the context that 'antigravity and cc' should be 'Antigravity and Claude Code'. The reference correctly identifies these as product names. | Omission of content about doubled quota before April 2, tightening, freeloading ending, and future lack of free Plus/Team routes via Xianyu; only the title is translated and much of the source content is missing. | The hypothesis translates the title but completely omits the translation of the 'Content' section provided in the source. Additionally, 'cc' is not translated or explained, whereas the reference clarifies it as 'Claude Code'. |
| google/translategemma-4b-it | Title: codex - I feel like I won't be able to keep up with antigravity and cc for long. |
严重
[准确性]
"Title: codex - I feel like I won't be able to keep up with antigravity and cc for long."
理由: Severe mistranslation of the subject and meaning. The source says 'codex' will follow the fate of 'antigravity' and 'cc' (implying they are shutting down or losing free access), but the hypothesis claims the speaker cannot 'keep up' with them. It also fails to translate the content. | Misinterprets "玩不了多久" (won't last long) as "won't be able to keep up" and omits the idea of codex following the fate of antigravity and cc; overall title meaning is wrong. | The translation does not accurately reflect the source's meaning about following the fate of Antigravity and Claude Code. | Mistranslation of '玩不了多久也要追随' - should mean 'won't last long and will follow' not 'won't be able to keep up with'. Also 'codex' should be capitalized as 'Codex' and the dash usage is awkward. |
| tencent/HY-MT1.5-1.8B | Translation of 'codex感觉玩不了多久也要追随antigravity和cc了' in en |
严重
[准确性]
"Translation of 'codex感觉玩不了多久也要追随antigravity和cc了' in en"
理由: Hypothesis is not a translation but a placeholder text indicating translation request | Placeholder text does not provide an actual translation and omits the meaning about Codex not lasting long and following Antigravity and Claude Code. | The translation is a direct translation without considering the context, leading to a less accurate representation. | The hypothesis is a meta-description of the translation task rather than the actual translation of the source text. It fails to convey any meaning from the source. |
| tencent/HY-MT1.5-7B | "en": "Title: It seems that we won't be able to play Codex for much longer before following Antigravity and CC.", "zh-TW": "标题:看來在跟隨Antigravity和CC之前,我们大概不能再玩Codex太久了。", "ja": "タイトル:AntigravityやCCに続く前に、もうすぐCodexをプレイできなくなりそうだ。」, "ko": "제목: Antigravity와 CC에 이어서, 우리는 곧 Codex를 더 이상 플레이할 수 없게 될 것 같다。」, "de":="Titel: Es scheint, dass wir Codex nicht mehr lange spielen können, bevor wir Antigravity und CC folgen.“, "fr":="Titre : Il semble que nous ne pourrons plus jouer à Codex avant longtemps, avant de suivre Antigravity et CC.", "it":="Titolo: Sembra che non potremo più giocare a Codex per molto tempo, prima di seguire Antigravity e CC."} |
严重
[准确性]
"Title: It seems that we won't be able to play Codex for much longer before following Antigravity and CC."
理由: Omits the main content about doubled quota before April 2, tightening limits, and losing ways to get free Plus/Team via Xianyu; only the title is translated. | Source uses 'codex' as slang for OpenClaw/Claude Code, but hypothesis translates it literally as 'Codex' product name, missing the contextual meaning | The translation misinterprets the logical relationship. The source says Codex will follow the fate of Antigravity and CC (i.e., become unavailable/restricted), not that users will follow Antigravity and CC before stopping playing Codex. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Seeking openclaw's guide to raising crabs |
严重
[准确性]
"openclaw"
理由: Source uses 'openclaw' as a proper name for a tool/service, not a literal lobster; leaving it lowercase and untranslated is acceptable, but the rest of the title misinterprets it as about raising real lobsters. | The hypothesis omits the content of the source text completely, only providing a mistranslated title. | The translation incorrectly translates 'tuning manual' as 'guide to raising crabs', which changes the intended meaning.
致命
[准确性]
"raising crabs"
理由: 严重误译,将网络俗称“养龙虾”(部署 OpenClaw)译成“养螃蟹”,完全改变话题为饲养螃蟹,与参考含义不符。 | Should be 'OpenClaw' not 'crabs'. '龙虾' (lobster/shrimp) is slang for OpenClaw in this context, not literal crabs. | The term '龙虾' (lobster) is slang for 'OpenClaw' in this context. The hypothesis translates it literally as 'crabs' (wrong animal) and misses the technical meaning entirely, unlike the reference which correctly identifies it as 'OpenClaw tuning manual'. |
| CohereLabs/tiny-aya-water | Title: Seeking OpenCLaw's guide to raising crabs |
严重
[准确性]
"Title: Seeking OpenCLaw's guide to raising crabs"
理由: Should be 'OpenClaw' not 'crabs'. '龙虾' (lobster/shrimp) is a metaphor for OpenClaw in this context, not literal crabs | Literal translation of the slang '养龙虾' (raising lobsters/crabs). In this context, it refers to deploying/running the 'OpenClaw' software, not actual animals. The reference correctly identifies this as 'OpenClaw tuning' or deployment. | Misinterprets gaming slang metaphor: "养龙虾" refers to deploying/using OpenClaw, not literally raising crabs. Also drops the nuance of a tuning/manual guide. |
| Qwen/Qwen2.5-14B-Instruct | Title: Seeking an openclaw lobster training manual Content: Recently, I want to keep my own lobster. Is there any lobster keeping manual? I remember that one was banned due to copyright issues on this platform before. Are there any other high-quality lobster keeping manuals? Thanks. |
严重
[准确性]
"lobster keeping manual"
理由: Same terminological misinterpretation; should reflect OpenClaw tuning/usage manual rather than caring for animals. | Failed to recognize '龙虾' (lobster) as slang for 'OpenClaw'. Translated literally as the animal. | Completely misses the metaphor. '龙虾' (lobster) is slang for OpenClaw. Should be 'Looking for an OpenClaw tuning manual' not literal lobster training |
| Qwen/Qwen2.5-7B-Instruct | Title: openclaw Crab Training Manual Content: Recently wanted to keep my own crab, is there any crab keeping manual? I remember there was one in the station recently that got banned due to copyright issues, are there any other high-quality crab keeping manuals? Thanks a lot. |
严重
[准确性]
"openclaw"
理由: Omitted in title; source is about OpenClaw, not real crabs. | The title conflates the proper noun 'OpenClaw' with the literal animal 'Crab', resulting in nonsense. | The translation is correct but could be more precise. |
| Qwen/Qwen3-14B | Cerco un manuale di addestramento per il granchio openclaw |
严重
[准确性]
"openclaw"
理由: Translation is in Italian instead of Chinese. Also 'granchio' (crab) is incorrect - should reference OpenClaw (a tool/platform). Target language should be Chinese. | The hypothesis only translates the title and omits the entire content paragraph asking for manuals and mentioning copyright bans. | Source is about OpenClaw (a tool), not a literal animal; treating it as an animal changes the meaning. |
| Qwen/Qwen3.5-4B | Title: Requesting an OpenCLAW Lobster Training Manual Content: Recently I want to raise my own lobster. Is there any manual for raising lobsters? I remember that a post about this was banned due to copyright infringement earlier this month. Are there any other high-quality manuals for raising lobsters? Thanks. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Request for OpenClaw Lobster Training Manual Content: Recently I want to raise my own lobster. Is there any manual on raising lobsters? I remember that a post about this was banned due to copyright issues recently. Are there any other high-quality manuals on raising lobsters? Thanks, folks. |
严重
[准确性]
"raise my own lobster"
理由: Literal translation of slang '养龙虾' (raising lobster). In this context, it refers to deploying/using 'OpenClaw', not actual crustaceans. | "养龙虾" here is community slang for deploying/using OpenClaw, not literally raising a lobster. | Should be 'deploy my own OpenClaw' not 'raise my own lobster' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Looking for an openclaw claw crab manual |
致命
[准确性]
"Looking for an openclaw claw crab manual"
理由: Completely mistranslates “龙虾” (lobster / OpenClaw) as “claw crab” and omits the request/guide nuance; overall content very incomplete. | The hypothesis fails to translate the content of the source text, only providing a garbled title. It misses the entire body text regarding copyright bans and requests for manuals. | The hypothesis completely mistranslates '龙虾' (OpenClaw) as 'claw crab' and adds redundant 'claw'. Should be 'OpenClaw tuning manual' not 'openclaw claw crab manual'.
严重
[术语]
"claw crab"
理由: The term '龙虾' should be translated as 'tuning' instead of 'claw crab'. | Incorrect translation - '龙虾' refers to OpenClaw (a software), not literal crab. This is a slang/terminology error where the software name is mistranslated as a literal animal | Incorrect translation of the slang term '龙虾'. In this context, '龙虾' refers to 'OpenClaw' (a software tool), not a literal crab. The hypothesis translates it literally as 'claw crab', losing the intended meaning. |
| google/gemma-3-12b-it | Title: Request for an OpenClaw Lobster Training Manual Content: I recently wanted to raise my own lobster. Are there any lobster care manuals? I remember one was banned due to copyright issues some time ago. Are there any other high-quality lobster care manuals available? Thanks. |
致命
[准确性]
"OpenClaw Lobster"
理由: Failed to recognize '龙虾' (lobster) as community slang for 'OpenClaw'. The translation interprets it literally as the animal, whereas the context (OpenClaw, copyright ban) clearly indicates the software tool. | "OpenClaw" is a project name metaphorically called "龙虾"; translating it literally as "Lobster" plus keeping "OpenClaw" creates semantic confusion and misses the specific tool name meaning. | Misses slang context. Should be 'deploy my own OpenClaw' not 'raise my own lobster'. |
| google/gemma-3-1b-it | Title: Request openclaw dragon shrimp manual guide | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: Seeking an Openclaw Lobster Training Manual Content: Recently I wanted to keep my own lobster, does anyone have a lobster keeping manual? I remember that there was one on the site that was shut down due to copyright issues, is there any other high-quality lobster keeping manual? Thank you, seniors. |
严重
[准确性]
"Openclaw Lobster Training Manual"
理由: “养龙虾” is slang for deploying/using OpenClaw, not literally training a lobster; term should reflect OpenClaw configuration/tuning manual. | Misses the slang context; '龙虾' refers to OpenClaw deployment, not literal lobster training | Failed to recognize 'lobster' (龙虾) as slang for 'OpenClaw'. Translated literally instead of using the technical term. | Should be 'deploying OpenClaw' instead of 'lobster keeping manual'.
严重
[准确性]
"keep my own lobster"
理由: Again treats “龙虾” literally instead of OpenClaw; distorts intended meaning. | Should be 'deploy my own OpenClaw' not 'keep my own lobster' | Literal translation of slang '养龙虾' (deploy OpenClaw). Context implies software deployment, not animal husbandry. |
| google/translategemma-12b-it | Title: OpenClaw Lobster Care Manual Content: I recently wanted to raise my own lobster. Are there any lobster care manuals available? I remember that there was one on this forum before that was banned due to copyright issues. Are there any other good lobster care manuals available? Thanks. |
严重
[准确性]
"I recently wanted to raise my own lobster. Are there any lobster care manuals available?"
理由: Same issue: treats 龙虾 as real lobster instead of OpenClaw, changing the topic from software deployment to animal care. | Literal translation of slang '养龙虾' (raising lobster) which actually means deploying/running OpenClaw instances. This changes the meaning entirely. | The hypothesis literally translates '龙虾' as 'lobster' when it's a metaphor for 'OpenClaw' (a tool/service). The reference correctly interprets this as 'deploy my own OpenClaw'. |
| google/translategemma-4b-it | Title: Guide to raising openclaw lobsters |
严重
[准确性]
"Guide to raising openclaw lobsters"
理由: Misinterprets "调教"/"养" here, which refers to configuring/tuning the OpenClaw tool, not literally raising lobsters; loses software/tool context and the request-for-manual nuance. | Misinterprets the slang '养龙虾' (raising lobsters) as literal animal husbandry. In this context, it refers to deploying/running 'OpenClaw' software. The title also misses the intent 'Looking for' (求). | The translation incorrectly translates '调教' as 'raising' instead of 'tuning'. | Mistranslation - '龙虾' in this context is slang for OpenClaw (a tool), not literal lobsters. Should be 'OpenClaw tuning/deployment manual' not 'raising lobsters'. Also 'openclaw' should be capitalized as 'OpenClaw'. |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
轻微
[准确性]
"Translation of '登录' in en"
理由: The hypothesis is a meta-description and does not translate the source title or content. Additionally, it references '登录' (Login) which is not the main content of the source provided. | Hypothesis is not a translation of the source content but a completely unrelated placeholder text 'Login' | The translation is incorrect and does not fit the context. | Source does not contain '登录'; hypothesis is unrelated to the source content about an OpenClaw manual. |
| tencent/HY-MT1.5-7B | "en": "Title: Guide to Raising OpenClaw Lobsters", "zh-TW": "标题:OpenClaw龙虾饲养指南", "ja": "标题:OpenClaw龙虾的饲养方法", "ko": "标题:OpenClaw龙虾养殖指南", "de":="Titel: Anleitung zur Zucht von OpenClaw-Lobstern", "fr": "Titre: Guide à l'élevage des homards OpenClaw", "it": "Titolo: Guida all'allevamento degli aragoste OpenClaw"} |
严重
[准确性]
"Guide to Raising OpenClaw Lobsters"
理由: Source uses '龙虾调教手册' (lobster training manual) as slang for OpenClaw deployment guide, but hypothesis translates it as literal lobster raising guide instead of recognizing the metaphor | Literal translation of slang. '养龙虾' (raising lobsters) is community slang for deploying/running 'OpenClaw'. Translating it as actual lobsters misleads the reader about the technical subject matter. | Misinterprets metaphorical "养龙虾" (deploy/use OpenClaw) as literally raising lobsters; changes meaning from model-tuning manual to animal care guide. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Shenzhen to introduce policy encouraging "raising crabs and shrimp", supporting OpenClaw&OPC development |
致命
[准确性]
""raising crabs and shrimp""
理由: Should be 'deploying OpenClaw'. '龙虾' is slang for OpenClaw, not literal crabs/shrimp. | 错误地将“养龙虾”(部署 OpenClaw)直译为“养螃蟹和虾”,引入不存在的“shrimp”,并丢失 OpenClaw 部署这一专门含义。 | The source '养龙虾' is slang for deploying OpenClaw. The hypothesis translates it literally as 'raising crabs and shrimp', which is nonsensical in this technical context. The reference correctly translates it as 'deploying OpenClaw'. |
| CohereLabs/tiny-aya-water | Title: Shenzhen to Introduce Policy Encouraging 'Raising Crabs and Shrimp', Supporting OpenClaw & OPC Development |
严重
[准确性]
"'Raising Crabs and Shrimp'"
理由: Misinterprets the slang '养龙虾' (deploying OpenClaw) as literal animal farming. Also adds 'Shrimp' which is not in the source ('龙虾' is lobster/crayfish, but here it's a code name). | Should be 'deploying OpenClaw'. '养龙虾' is a metaphor for deploying OpenClaw, not literal crabs/shrimp | The translation uses slang ('raising crabs and shrimp') instead of the proper term 'deploying OpenClaw'. | Source is about encouraging OpenClaw deployment ('养龙虾' as slang), not raising crabs and shrimp as animals. |
| Qwen/Qwen2.5-14B-Instruct | Title: Shenzhen plans to introduce policies encouraging 'raising lobsters', supporting the development of OpenClaw & OPC Content: Proposes encouraging market-oriented and specialized platform carriers to launch 'lobster service areas', offering free deployment services for OpenClaw |
严重
[准确性]
"encouraging 'raising lobsters'"
理由: Failed to translate slang '养龙虾' which means deploying OpenClaw. Literal translation makes no sense in a policy context. | "养龙虾" is a slang for deploying/using OpenClaw, not literal lobster raising; conflicts with explicit OpenClaw mentions. | Misses the slang metaphor. '养龙虾' refers to deploying OpenClaw, not literal lobster farming. Reference correctly translates as 'deploying OpenClaw' |
| Qwen/Qwen2.5-7B-Instruct | Title: Shenzhen Aims to Promote Policies Encouraging ‘Shrimp Farming’, Supporting OpenClaw&OPC Development Content: Propose encouraging market-oriented and specialized platform carriers to launch ‘Shrimp Service Areas’, providing free deployment services for OpenClaw |
严重
[准确性]
"Shrimp Farming"
理由: Failed to identify '养龙虾' as slang for deploying OpenClaw. Translated literally as farming shrimp, which contradicts the context of software development. | Incorrect translation of '养龙虾'. In this policy context, it means 'deploying OpenClaw', not literal shrimp farming. | The translation is accurate but could be more precise. |
| Qwen/Qwen3-14B | Shenzhen introdurrà una politica per incoraggiare l'allevamento di 'gamberi', supportando lo sviluppo di OpenClaw & OPC |
轻微
[准确性]
"allevamento di 'gamberi'"
理由: The term 'gamberi' (crabs) is a mistranslation of '龙虾' (lobsters). | The hypothesis only translates the title and omits the content about 'OpenClaw Service Areas' and free deployment services. | ‘养龙虾’ here refers to running/deploying OpenClaw, not literally raising shrimp; translation is misleading. | Translation is in Italian instead of Chinese. Target language is zh (Chinese), not Italian. |
| Qwen/Qwen3.5-4B | "en": "Title: Shenzhen plans to issue policies encouraging \"raising crayfish,\" supporting OpenClaw & OPC development\nContent: Proposes encouraging market-oriented and specialized platform carriers to launch \"Crayfish Service Zones,\" providing free OpenClaw deployment services", "zh-TW": "標題:深圳擬出台政策鼓勵「養龍蝦」,支援 OpenClaw 與 OPC 發展\n內容:建議鼓勵市場化、專業化平台載體推出「龍蝦服務區」,免費提供 OpenClaw 部署服務", "ja": "タイトル:深圳市がカニの飼育を奨励する政策を策定し、OpenClaw と OPC の発展をサポートします\n内容:市場主導・専門的なプラットフォームキャリアに「カニサービスゾーン」の導入を促し、OpenClaw のデプロイメントサービスを無料提供する提案", "ko": "제목: 진저니가 '고등어 기르기'를 장려하는 정책을 발표하고, OpenClaw 와 OPC 발전을 지원함\n내용: 시장 중심 및 전문화된 플랫폼 캐리어에 '고등어 서비스 구역'을 도입하여 무료로 OpenClaw 배포 서비스를 제공하도록 제안함", "de": "Titel: Shenzhen plant, die Politik zur Förderung des \"Züchtens von Garnelen\" zu erlassen und die Entwicklung von OpenClaw & OPC zu unterstützen\nInhalt: Vorschlag, marktorientierte und spezialisierte Plattformträger mit \"Garnelen-Servizonen" auszustatten und kostenlose OpenClaw-Deploymentservices anzubieten", "fr": "Titre : Pékin prévoit de publier une politique encourageant la \"culture de crevettes\", soutenant le développement d'OpenClaw et OPC\nContenu : Proposition d'encourager les porteurs de plateforme marchands et spécialisés à lancer des \"zones de service crevette\" et fournir gratuitement des services de déploiement OpenClaw", "it": "Titolo: Shenzhen prevede di emanare politiche che incoraggiano l'allevamento di granchi, sostenendo lo sviluppo di OpenClaw e OPC\nContenuto: Propone di incoraggiare i portatori di piattaforma orientati al mercato e specializzati a lanciare \"zone di servizio granchio\" e fornire gratuitamente servizi di distribuzione OpenClaw" } |
严重
[准确性]
""en": "Title: Shenzhen plans to issue policies encouraging \"raising crayfish,\" supporting OpenClaw & OPC development"
理由: Failed to translate the slang '养龙虾' (raising lobsters/crayfish) which refers to 'deploying OpenClaw'. It literally translates it to an agricultural activity. | “养龙虾” is slang for deploying/using OpenClaw, not literally raising crayfish; this misinterprets the metaphor. | Should be 'deploying OpenClaw'; the metaphor 'raising crayfish/lobster' refers to OpenClaw deployment, not literal crayfish |
| Qwen/Qwen3.5-9B | Title: Shenzhen plans to introduce policies encouraging the cultivation of 'Lobster', supporting the development of OpenClaw & OPC Content: Proposes encouraging market-oriented and specialized platform carriers to launch 'Lobster Service Zones', providing free OpenClaw deployment services. |
严重
[准确性]
"cultivation of 'Lobster'"
理由: Literal translation of slang '养龙虾'. Context clearly indicates 'deploying OpenClaw' as confirmed by the reference and subsequent text. | Should be 'deploying OpenClaw' not 'cultivation of Lobster'; misses the slang context | "养龙虾" here means deploying/using OpenClaw, not cultivating lobsters.
严重
[准确性]
"'Lobster Service Zones'"
理由: Should be 'OpenClaw Service Areas' not 'Lobster Service Zones' | The service areas are for OpenClaw, not generally for "Lobster"; loses product-specific meaning. | Literal translation of slang. Should be 'OpenClaw Service Areas' to reflect the actual technical service being discussed. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | "en": "Title: Shenzhen is expected to draft policies to encourage 'dragon shrimp' breeding, supporting OpenClaw & OPC development\nContent: Propose encouraging market-oriented and specialized platform carriers to launch \"Dragon Shrimp Service Zones\" with free deployment services for OpenClaw.", "zh-TW": "Title: 深圳拟出台政策鼓励\"养龙虾\",支持OpenClaw&OPC发展\nContent: 提出鼓励市场化、专业化平台载体推出\"龙虾服务区\",免费提供OpenClaw部署服务", "ja": "Title: 深圳が政策を提案して「龍蟹の養殖」を奨励し、OpenClaw & OPCの開発を支援する予定です。\nContent: 「龍蟹サービスゾーン」を市場化と専門化したプラットフォーム・キャリアが展開することを奨励し、OpenClawのデプロイメントを無料で提供する予定です。", "ko": "Title:深圳拟出台政策鼓励"养龙虾",支持OpenClaw&OPC发展\nContent:提出鼓励市场化、专业化平台载体推出"龙虾服务区",免费提供OpenClaw部署服务", "de": "Title: Die Stadt Shenzhen plant, ein Politik zu erlassen, um die Zucht von 'Ligustina' zu fördern und die Entwicklung von OpenClaw & OPC zu unterstützen.\nContent: Es wird vorgeschlagen, die Marktwirtschaftliche und spezialisierten Plattformträger zu bewegen, einen \"Ligustina-Dienstbereich\" auszuschöpfen und OpenClaw ohne Kosten zu deployen.", "fr": "Title: La ville de Shenzhen est en train de rédiger une politique pour encourager l'élevage de 'l'animal des dragons'\nContent: On suggère d'encourager les plateformes de charge spécialisées et marchandises de lancer un \"Zone de service de dragon shrimps\" et de fournir le déploiement gratuit d'OpenClaw.", "it": "Title: La città di Shenzhen sta predisponendo una politica per incoraggiare l'allevamento di 'gamberi draghi'\nContent: Si propone di incoraggiare le piattaforme di caricamento specializzati e di mercato a lanciare una \"Zona di servizio dei gamberi draghi\" e offrire il dispiegamento gratuito di OpenClaw." } <|im_end|> | 通过 (无共识错误) |
| google/gemma-3-12b-it | "en": "Title: Shenzhen Plans to Introduce Policies Encouraging \"Crawfish Farming\", Supporting OpenClaw & OPC Development\nContent: Proposing the launch of market-oriented and specialized platform carriers offering \"Crawfish Service Zones\" providing free OpenClaw deployment services.", "zh-TW": "標題:深圳市擬推出政策鼓勵「養龍蝦」,支持OpenClaw&OPC發展\n內容:建議推出市場化、專業化的平台載體,推出「龍蝦服務區」,免費提供 OpenClaw 部署服務。", "ja": "タイトル:深セン市は「エビ(ザリガニ)」養殖を奨励し、OpenClaw&OPCの開発を支援する方針を発表予定\n内容:市場志向で専門的なプラットフォームキャリアの立ち上げを提案し、「エビ(ザリガニ)サービスゾーン」を提供することで、OpenClawのデプロイメントサービスを無料で提供します。", "ko": "제목: 선전시, '새우(게)' 양식을 장려하고 OpenClaw 및 OPC 개발을 지원하는 정책 도입 계획\n내용: 시장 지향적이고 전문적인 플랫폼 사업자 출시를 제안하며, '새우(게) 서비스 구역'을 통해 OpenClaw 배포 서비스를 무료로 제공합니다.", "de": "Titel: Shenzhen plant die Einführung von Maßnahmen zur Förderung der „Krabbenzucht“, Unterstützung von OpenClaw & OPC Entwicklung\nInhalt: Vorschlag zur Einführung marktorientierter und spezialisierter Plattformträger mit „Krabben-Servicezonen“, die kostenlose OpenClaw-Bereitstellungsdienste anbieten.", "fr": "Titre : Shenzhen prévoit de mettre en place des politiques encourageant la « culture d’écrevisses », soutenant le développement d'OpenClaw et d'OPC\nContenu : Proposition de lancement de plateformes porteuses commercialisées et spécialisées offrant des « zones de service écrevisses » fournissant gratuitement des services de déploiement OpenClaw.", "it": "Titolo: Shenzhen prevede di introdurre politiche per incoraggiare l'"allevamento di aragoste", sostenendo lo sviluppo di OpenClaw e OPC\nContenuto: Proposta di lancio di piattaforme commerciali e specializzate che offrono \"zone di servizio per aragoste\" fornendo servizi di implementazione OpenClaw gratuiti." } <|im_end|> |
致命
[准确性]
"Crawfish Farming / Crawfish Service Zones"
理由: Misses the slang meaning. '龙虾' refers to OpenClaw, not crawfish/lobster farming. Should be 'deploying OpenClaw'. | Here "养龙虾" is a metaphor for deploying OpenClaw; translating it literally as crawfish farming and service zones distorts the intended meaning of OpenClaw-related services. | Literal translation of the slang '养龙虾' (deploying OpenClaw). The context is about software deployment, not agriculture. |
| google/gemma-3-1b-it | The Shenzhen government is proposing to issue policies encouraging 'raising lobsters', supporting the development of OpenClaw and OPC platforms. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Shenzhen Plans to Introduce Policies to Encourage ‘Raising Lobsters,’ Supporting OpenClaw & OPC Development |
严重
[准确性]
"‘Raising Lobsters,’"
理由: Failed to translate slang '养龙虾' as 'deploying OpenClaw'. Literal translation changes the meaning entirely. | Should be 'deploying OpenClaw' not 'raising lobsters' | “养龙虾” is slang for deploying/hosting OpenClaw, not literally raising lobsters; key metaphorical meaning lost. |
| google/translategemma-12b-it | "en": "Title: Shenzhen Plans to Introduce Policies to Encourage \"Lobster Farming\", Supporting the Development of OpenClaw & OPC.", "zh-TW": "標題:深圳市擬推出政策,鼓勵「養龍蝦」,並支持 OpenClaw 與 OPC 的發展。", "ja": "タイトル:深セン市は、「エビの養殖」を奨励し、OpenClawおよびOPCの開発を支援する政策を導入予定。", "ko": "제목: 선전시, '새우 양식' 장려 정책 도입 예정, OpenClaw 및 OPC 개발 지원", "de": "Titel: Shenzhen plant die Einführung von Richtlinien zur Förderung der „Garnelenzucht“, um die Entwicklung von OpenClaw und OPC zu unterstützen.", "fr": "Titre : Shenzhen prévoit d'adopter des politiques visant à encourager l'"élevage d'écrevisses", afin de soutenir le développement d'OpenClaw et d'OPC.", "it": "Titolo: Shenzhen prevede di introdurre politiche per incentivare l'\"allevamento di gamberi\", sostenendo lo sviluppo di OpenClaw e OPC." } <|file_separator|> | 通过 (无共识错误) |
| google/translategemma-4b-it | Shenzhen Plans to Introduce Policies to Encourage "Lobster Farming", Supporting OpenClaw & OPC Development |
严重
[准确性]
""Lobster Farming""
理由: Mistranslation - '养龙虾' is slang for deploying/running OpenClaw, not literal 'lobster farming'. Should be 'deploying OpenClaw' not 'Lobster Farming'. | The translation uses 'lobster farming' instead of the correct term 'deploying OpenClaw'. | Literal translation of the slang '养龙虾'. It should be translated as 'deploying OpenClaw' or similar to reflect the technical context, as done in the reference. | In this context "养龙虾" is community slang for deploying/operating OpenClaw, not literal lobster farming; using the literal phrase distorts the intended meaning. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: The translation is completely incorrect and misrepresents the content. | The hypothesis translates neither the title nor the content. It appears to be a hallucinated or misplaced label ('Login') unrelated to the source text about Shenzhen policies and OpenClaw. | Hypothesis is completely unrelated to source; source is about Shenzhen policy for OpenClaw deployment, hypothesis is 'Login' | 'Login' is completely unrelated to the source about Shenzhen policies and OpenClaw deployment. |
| tencent/HY-MT1.5-7B | "en": "Title: Shenzhen Plans to Introduce Policies to Encourage ‘Lobster Farming’ and Support the Development of OpenClaw & OPC", "zh-TW": "標題:深圳擬出政策鼓勵「養龍蝦」,支持OpenClaw與OPC發展", "ja": "標題:深センが「ロブスター養殖」を奨励する政策を導入する予定で、OpenClawおよびOPCの発展を支援", "ko": "제목: 심천이 “로브스터 양식”을 장려하는 정책을 도입할 예정으로, OpenClaw 및 OPC의 발전을 지원", "de":="Titel: Shenzhen plant, Politiken einzuführen, um die „Lobsterzucht“ zu fördern und die Entwicklung von OpenClaw & OPC zu unterstützen", "fr":="Titre : Shenzhen prévoit d’introduire des politiques pour encourager l’„élevage de homards“ et soutenir le développement d’OpenClaw & OPC", "it":="Titolo: Shenzhen intende introdurre politiche per incentivare la “coltivazione di aragoste” e sostenere lo sviluppo di OpenClaw & OPC"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: AionUi V1.8.7 Minor Update: Added New API Model Platform, FlyBook channel, and also built a one-stop service for Dragonfly🐌... |
致命
[准确性]
"Dragonfly🐌"
理由: The source uses '龙虾🦞' (lobster/crayfish) as slang for OpenClaw. The hypothesis mistranslates this as 'Dragonfly' and uses a snail emoji (🐌), completely losing the intended meaning and introducing hallucinations. | 将“小龙虾”(OpenClaw 相关隐喻)误译为“Dragonfly”,且用蜗牛表情,完全改变所指对象,与 OpenClaw 一条龙体验服务不符。 | Should be 'OpenClaw' not 'Dragonfly'. Also wrong emoji - should be 🦞 (lobster) not 🐌 (snail). |
| CohereLabs/tiny-aya-water | Thankfully there have been several GitHub collaborations to help out recently... arranged Agent auto review PRs | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: AionUi V1.8.7 minor update: Added NewAPI model platform, added Feishu channel, and created a comprehensive service for lobster 🦞 experience... Content: Fortunately, there have been several GitHub collaborators recently to help... Arranged Agent for automatic PR review | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: AionUi V1.8.7 Minor Update: Added NewAPI Model Platform, Added Feishu Channel, and Provided Lobster🦞 Experience One-Stop Service... Content: Fortunately, several new GitHub collaborators have joined to help recently... Arranged Agent for Automatic PR Review |
严重
[准确性]
"Lobster🦞 Experience One-Stop Service"
理由: Mistranslation of '龙虾' (OpenClaw slang). Should be 'OpenClaw experience one-stop service', not 'Lobster experience'. | Failed to translate the slang '龙虾' to its intended meaning 'OpenClaw'. While 'Lobster' is the literal animal, in this tech context, it refers to the software tool. | The translation is correct but could be more precise. |
| Qwen/Qwen3-14B | Aggiornamento minore di AionUi V1.8.7: aggiunta della piattaforma NewAPI, aggiunto il canale Feishu e creata un'esperienza completa per le aragoste 🦞... |
严重
[准确性]
"esperienza completa per le aragoste 🦞"
理由: The hypothesis only translates the title and omits the content about GitHub contributors and automatic PR reviews. | Translation is in Italian instead of Chinese. Target language should be Chinese (zh). | The term 'aragoste' (crabs) is a mistranslation of '龙虾' (lobsters). | ‘龙虾’ here is slang for OpenClaw; rendering it as literal lobsters loses the software meaning. |
| Qwen/Qwen3.5-4B | "en": "Title: AionUi V1.8.7 Minor Update: Added NewAPI Model Platform, integrated Feishu channel, and created an all-in-one Lobster 🦞 experience service...",\n \"zh-TW\": \"標題:AionUi V1.8.7 小幅更新:新增 NewAPI 模型平台、整合飛書頻道,並打造了一條龍的蝦🦞體驗服務...\",\n \"ja\": \"タイトル:AionUi V1.8.7 のマイナーアップデート:NewAPI モデルプラットフォームを追加、Feishu チャンネルを統合し、一貫したロブスター🦞体験サービスを作成しました...\",\n \"ko\": \"제목: AionUi V1.8.7 소규모 업데이트: NewAPI 모델 플랫폼 추가, Feishu 채널 통합, 그리고 로브스터🦞 일괄 서비스 제공...\",\n \"de\": \"Titel: AionUi V1.8.7 Kleine Aktualisierung: Neue API-Modellplattform hinzugefügt, Feishu-Kanal integriert und ein All-in-One-Lobster🦞Erfahrungspaket erstellt...\",\n \"fr\": \"Titre : AionUi V1.8.7 Mise à jour mineure : Ajout de la plateforme modèle NewAPI, intégration du canal Feishu et création d'un service d'expérience tout-en-un avec un homard🦞...\",\n \"it\": \"Titolo: AionUi V1.8.7 Aggiornamento minore: Aggiunta piattaforma modello NewAPI, integrato canale Feishu e creato servizio esperienza tutto-in-uno con un gambero🦞..." } |
严重
[准确性]
"an all-in-one Lobster 🦞 experience service"
理由: Source and reference refer to an all-in-one OpenClaw experience; rendering as literal “Lobster” loses the product name and intended meaning. | Should be 'OpenClaw experience service'; 'lobster' is slang for OpenClaw, not literal lobster | Failed to recognize '龙虾' as slang for 'OpenClaw'. Translated literally as the animal. |
| Qwen/Qwen3.5-9B | Title: AionUi V1.8.7 Minor Update: Added NewAPI Model Platform, Integrated Feishu Channel, and Launched a One-Stop Service Experience for Lobster 🦞... |
严重
[准确性]
"One-Stop Service Experience for Lobster 🦞"
理由: "龙虾" is a slang reference to OpenClaw; should indicate an OpenClaw experience rather than literal lobster. | Literal translation of slang '龙虾...服务'. Should be 'OpenClaw experience service' to maintain technical accuracy. | Should be 'OpenClaw' not 'Lobster'; missing the actual product name |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | AionUI V1.8.7 Update: Added NewAPI model platform, added Feishu channel, and created a shrimp experience for one-stop service... |
严重
[准确性]
"created a shrimp experience for one-stop service"
理由: ‘龙虾’ here refers to OpenClaw ecosystem; translating as ‘shrimp experience’ loses the product/tool reference present in the reference. | The hypothesis translates '龙虾' as 'shrimp' but should be 'OpenClaw'. Also 'one-stop service' is awkward; should be 'all-in-one OpenClaw experience service'. | The translation omits the word 'minor' from the title, which is present in the reference. | The hypothesis only translates the title and ignores the content text ('庆幸的是最近多了好几个 github 搭子来帮忙...').
严重
[术语]
"shrimp experience"
理由: Incorrect translation - '小龙虾' here refers to OpenClaw (software), not literal shrimp. Should be 'OpenClaw' | The term 'shrimp' is used instead of the more accurate 'OpenClaw' as per the reference. | Mistranslation of '龙虾'. It refers to 'OpenClaw', not literal shrimp. 'Shrimp experience' loses the technical context. |
| google/gemma-3-12b-it | Title: AionUi V1.8.7 Minor Update: Added NewAPI model platform, integrated Feishu channel, and built an end-to-end lobster 🦞 service experience... |
严重
[准确性]
"lobster 🦞 service experience"
理由: Mistranslation of '一条龙服务' (yī tiáo lóng fúwù). The source uses '龙虾' (lobster) as a pun/wordplay, but the correct translation should be 'all-in-one' or 'complete' service, not literally 'end-to-end lobster service'. The reference correctly interprets this as 'all-in-one' service. | Failed to translate the slang '龙虾' as 'OpenClaw'. Interpreted literally. | Original refers to an all‑in‑one OpenClaw experience; mapping this to literal "lobster" loses the product/tool name and changes the meaning. |
| google/gemma-3-1b-it | Lucky is that several GitHub contributors have come to help recently... |
严重
[流畅性]
"Lucky is that several GitHub contributors have come to help recently..."
理由: Unnatural English word order; should be “Fortunately, several GitHub contributors have come to help recently...” or similar. | Unnatural phrasing. Should be 'Fortunately' or 'I'm lucky that'. | Awkward phrasing - should be 'Fortunately, several GitHub buddies have come to help recently' or similar. Also missing the second part about Agent auto-review. |
| google/gemma-3-4b-it | Title: AionUi V1.8.7 Small Update: New API Model Platform Added, Added Slack Channel, and Created a Lobster 🦞 End-to-End Service... |
严重
[准确性]
"Slack Channel"
理由: Should be 'added Feishu channel' not 'Slack channel' | Mistranslation of '飞书' (Feishu/Lark) as 'Slack'. These are different platforms. | “飞书” refers to Feishu (Lark), not Slack; wrong product name.
严重
[准确性]
"Lobster 🦞 End-to-End Service"
理由: Should be 'created an all-in-one OpenClaw experience service' not generic 'Lobster End-to-End Service' | Failed to translate slang '龙虾' as 'OpenClaw'. Literal translation is incorrect in this technical context. | Misrepresentation of '龙虾🦞体验一条龙服务' as 'Lobster 🦞 End-to-End Service' instead of 'OpenClaw experience service' | “龙虾体验一条龙服务” refers to an all-in-one OpenClaw experience, not generic lobsters; OpenClaw should be mentioned. |
| google/translategemma-12b-it | Title: AionUi V1.8.7 Minor Update: Added NewAPI model platform, integrated Feishu channel, and created an end-to-end lobster 🦞 service experience... |
严重
[术语]
"created an end-to-end lobster 🦞 service experience"
理由: Incorrect terminology: 'lobster' should be 'OpenClaw' - in this context, 龙虾 is slang for OpenClaw (AI coding tool), not actual lobster | "龙虾" is community slang for OpenClaw; translating it as literal lobster misrepresents the meaning of an OpenClaw experience service. | Should be 'all-in-one OpenClaw experience service' instead of 'end-to-end lobster 🦞 service experience'. |
| google/translategemma-4b-it | "en": "Title: AionUi V1.8.7 Small Update: New API model platform added, plus a Slack channel, and a comprehensive \"Lobster\" experience one-stop service...", "zh-TW": "標題:AionUi V1.8.7 小更新:新增新 API 模型平台、加上飛書頻道,並打造了一個完整的「龍蝦」體驗一站式服務...", "ja": "タイトル:AionUi V1.8.7 小アップデート:新しいAPIモデルプラットフォームを追加、さらにSlackチャンネルを追加し、「ロブスター」体験の一括サービスを構築...", "ko": "제목: AionUi V1.8.7 소 업데이트: 새로운 API 모델 플랫폼 추가, 더불어 슬랙 채널 추가 및 '로브스터' 경험을 위한 통합 서비스 구축...", "de": "Titel: AionUi V1.8.7 Kleine Aktualisierung: Neue API-Modellplattform hinzugefügt, außerdem ein Slack-Kanal und eine umfassende \"Lobst"-Erlebnis-One-Stop-Service...", "fr": "Titre : AionUi V1.8.7 Petite mise à jour : Nouvelle plateforme de modèles API ajoutée, ainsi qu'un canal Slack et un service complet de « expérience crabe »...", "it": "Titolo: AionUi V1.8.7 Piccola aggiornamento: Nuova piattaforma di modelli API aggiunta, inoltre un canale Slack e un servizio completo di \"esperienza gambero\"..." } <|file_separator|> |
严重
[准确性]
"plus a Slack channel"
理由: The translation incorrectly translates 'OpenClaw' as 'Lobster'. | Multiple errors: (1) '飞书' is Feishu, not Slack; (2) '龙虾' refers to OpenClaw, not literal 'Lobster'; (3) Should reference 'NewAPI' not generic 'New API model platform'. | "飞书" refers to Feishu (Lark), not Slack; wrong product name. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: 'Login' is not a translation of the source about AionUi updates and OpenClaw service. | Hypothesis is completely unrelated to source; source is about AionUi update with OpenClaw features, hypothesis is 'Login' | The hypothesis is completely unrelated to the source text, which discusses a software update. 'Login' is a hallucination or error. | The translation is completely incorrect and misrepresents the content. |
| tencent/HY-MT1.5-7B | "en": "Title: AionUi V1.8.7 Minor Update: New API Model Platform Added, Lark Channel Integrated, and a Comprehensive Lobster 🦞 Experience Service Launched...", "zh-TW": "标题:AionUi V1.8.7小更新:新增NewAPI模型平台、整合了Lark频道,同时还推出了全方位的龙虾🦞体验服务…", "ja": "タイトル:AionUi V1.8.7のマイナーアップデート:NewAPIモデルプラットフォームの追加、Larkチャンネルの統合、そして包括的なロブスター🦞体験サービスの開始…", "ko": "제목: AionUi V1.8.7 소규모 업데이트: NewAPI 모델 플랫폼 추가, Lark 채널 통합, 그리고 종합적인 로브스터🦞 체험 서비스 출시…", "de":="Titel: AionUi V1.8.7 kleine Aktualisierung: Hinzufügung einer neuen NewAPI-Modellplattform, Integration des Lark-Channels und Einführung eines umfassenden Lobster🦞 Erlebnisservices…", "fr":="Titre : AionUi V1.8.7 mise à jour mineure : Ajout d’une nouvelle plateforme de modèles NewAPI, intégration du canal Lark et lancement d’un service complet d’expérience avec le lobster🦞…", "it":="Titolo: AionUi V1.8.7 aggiornamento minore: aggiunta di una nuova piattaforma per modelli NewAPI, integrazione del canale Lark e lancio di un servizio completo per l’esperienza con il lobster🦞…"} |
严重
[准确性]
"Lobster 🦞 experience service"
理由: "龙虾🦞体验一条龙服务" refers to an all-in-one OpenClaw experience, not to lobsters literally; the metaphor should be preserved as OpenClaw-related. | Literal translation of slang '龙虾...体验'. This refers to an 'OpenClaw experience', not a culinary or zoological lobster experience. | Source uses '龙虾' (lobster) as slang for OpenClaw, but hypothesis translates it as literal 'lobster' instead of 'OpenClaw' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: CC calls GPT-5.4 Unstoppable! |
严重
[准确性]
"CC calls GPT-5.4 Unstoppable!"
理由: 整体含义接近,但“Unstoppable!”稍偏夸张,相比“强无敌”虽风格接近,仍略有语气加强,不过不影响主要信息。 | The hypothesis omits the entire content text which details the specific behavior of skills and sub-agents triggered by '/init'. | Mistranslation: 'CC' refers to 'Claude Code', not 'CC'. The translation 'CC calls GPT-5.4' is incorrect - should be 'Claude Code calling GPT-5.4'. | Should be 'Claude Code calling GPT-5.4'. 'CC' is abbreviated but reference clarifies it means Claude Code. |
| CohereLabs/tiny-aya-water | Title: CC calls GPT-5.4 Unstoppable! |
严重
[准确性]
"Title: CC calls GPT-5.4 Unstoppable!"
理由: Should be 'Claude Code calling GPT-5.4 is incredibly powerful'. 'CC' is unclear; reference clarifies as 'Claude Code' and 'Unstoppable' doesn't match 'incredibly powerful' | Complete omission of the source content paragraph. | Incorrect: should be 'GPT-5.4' with hyphen, not 'gpt 5.4' | Misses that "CC" here refers to Claude Code, not clearly expressed; also "Unstoppable" slightly exaggerates nuance vs. "incredibly powerful" but impact is limited. |
| Qwen/Qwen2.5-14B-Instruct | Title: CC Call gpt-5.4 Super Powerful! Content: After enabling gpt 5.4, I found that one /init and a bunch of SKILLS and sub-agents were all activated... Is cx's command so terrifying to follow?? |
严重
[准确性]
"cx's command"
理由: "cx" here stands for Codex; translating as "cx's command" without explanation loses the intended product name, and "command" is less accurate than "instruction". | Reference says 'Codex's instruction following' but hypothesis says 'cx's command' - 'cx' is unclear abbreviation | cx should be translated as 'Codex' (reference) not left as 'cx'. This is an acronym that needs proper expansion in English. | Failed to expand 'cx' to 'Codex' (or Claude Code depending on context, but reference uses Codex), leaving an obscure abbreviation. |
| Qwen/Qwen2.5-7B-Instruct | Title: CC invoking gpt-5.4 unbeatable! Content: After enabling gpt 5.4, I found that an /init, a bunch of SKILLs and sub-agents were awakened... Do cx commands follow such terrifying rules?!! |
严重
[准确性]
"cx"
理由: Mistranslation of 'cx的指令遵循'. 'cx' refers to Codex, not a generic command. Should be 'Is Codex's instruction following really this terrifying?' | Failed to expand or translate the abbreviation 'cx', which refers to 'Codex' (as confirmed by the reference and context 'CC' = Claude Code vs Codex). | Refers to Codex, not 'cx'; left untranslated and unclear. | CC should be 'Claude Code' not just 'CC'. The reference correctly identifies it as 'Claude Code'. |
| Qwen/Qwen3-14B | Titolo: CC chiama gpt-5.4 estremamente potente! | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: CC calls gpt-5.4 and is unstoppable! Content: After enabling GPT 5.4, I found that just one /init command triggers a cascade of SKILLS and sub-agents... Is CX's instruction following this terrifying? |
严重
[准确性]
"CX's instruction following"
理由: The source text has a typo 'cx', which contextually refers to 'Codex' (as confirmed by the reference). The hypothesis transliterates the typo as 'CX' instead of correcting it to 'Codex' based on context. | Should be 'Codex's instruction following'; 'CX' is unclear abbreviation | The reference indicates “Codex's instruction following”; ‘cx’ here refers to Codex, not an entity named CX. Using “CX” is unclear/misleading in English. | Incorrect entity: 'CX' should be 'Codex' as in reference.
严重
[准确性]
"CC calls gpt-5.4"
理由: Should be 'Claude Code calling GPT-5.4'; 'CC' abbreviation is unclear, reference clarifies as 'Claude Code' | The translation uses 'unstoppable' which is less precise than the reference's 'incredibly powerful'. | Incorrect entity: 'CC' should be 'Claude Code' as in reference, not 'CC'. |
| Qwen/Qwen3.5-9B | Title: CC calls gpt-5.4 and it's unstoppable! Content: After enabling gpt 5.4, I found that one /init triggered a bunch of SKILLS and sub-agents to wake up... Is cx's instruction following this terrifying? |
严重
[准确性]
"cx's instruction following"
理由: "cx" in the community context refers to Codex, not an unexplained "cx"; target audience may misinterpret. | Should be 'Is Codex's instruction following really this terrifying??' not 'cx's' | Failure to resolve the abbreviation 'cx' to 'Codex' (or Claude Code depending on context, but reference uses Codex), leading to confusion. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: CC Call to gpt-5.4 Unstoppable! |
严重
[准确性]
"CC Call to gpt-5.4 Unstoppable!"
理由: The hypothesis uses 'CC' but should be 'Claude Code'. Also 'Unstoppable' is not accurate for '强无敌' which means 'incredibly powerful' not 'unstoppable'. | “强无敌” is rendered as “Unstoppable”, which is close but slightly changes nuance from ‘super strong/invincible’ to ‘cannot be stopped’; meaning largely preserved. | The hypothesis completely omits the translation of the content text ('开了 gpt 5.4 后发现...'). |
| google/gemma-3-12b-it | Title: CC Calling gpt-5.4 Super Invincible! |
严重
[准确性]
"CC Calling gpt-5.4"
理由: Should be 'calling' instead of 'Calling'. | "CC" in context is "Claude Code"; leaving only the abbreviation slightly reduces clarity compared to the explicit name present in reference. | Missing the content translation entirely. | Incorrect: 'CC' should be 'Claude Code' - CC is abbreviation for Claude Code, not kept as 'CC'. Also missing Content translation entirely | Incomplete translation. The hypothesis only translates the title but omits the content entirely. Additionally, 'CC' is not properly identified as 'Claude Code' as shown in the reference, and 'Super Invincible' is a less accurate translation of '强无敌' compared to 'incredibly powerful'. |
| google/gemma-3-1b-it | Translation of 'CC 调用 gpt-5.4 强无敌!' in en |
致命
[准确性]
"Translation of 'CC 调用 gpt-5.4 强无敌!' in en"
理由: The model outputted a meta-statement describing the task instead of performing the translation. | The translation is incomplete and does not provide the actual content of the source text. | Translation is a generic placeholder 'Translation of...' instead of actual translation. Also 'CC' should be 'Claude Code' not 'CC'. Content section is completely omitted | Not a translation at all - this is a meta-description rather than actual translation. Should be 'Claude Code calling GPT-5.4 is incredibly powerful!' Missing entire content section. | This is meta-text, not a translation. It fails to render the actual meaning (“Claude Code calling GPT-5.4 is incredibly powerful!”). |
| google/gemma-3-4b-it | Title: CC Calling gpt-5.4 Powerful and Unbeatable! |
严重
[准确性]
"CC Calling gpt-5.4 Powerful and Unbeatable!"
理由: “强无敌” is better rendered as “incredibly powerful” or similar; title also omits that CC here means Claude Code calling GPT-5.4, which may cause ambiguity. | Missing content translation entirely; only title provided | Reference says 'GPT 5.4' with space, hypothesis uses lowercase without space |
| google/translategemma-12b-it | Title: Calling gpt-5.4 - incredibly powerful! |
严重
[准确性]
"Title: Calling gpt-5.4 - incredibly powerful!"
理由: Omission of subject "CC"/Claude Code calling GPT-5.4 and missing nuance that CC is the caller, not a generic call; weakens the specific reference. | The hypothesis omits the subject 'CC' (Claude Code) which is crucial for context, and completely fails to translate the 'Content' section of the source text. | The hypothesis omits 'CC' (Claude Code) from the title. The reference correctly shows 'Claude Code calling GPT-5.4' which provides important context about which tool is being discussed. |
| google/translategemma-4b-it | Title: Calling CC with gpt-5.4 - Unstoppable! |
严重
[准确性]
"Title: Calling CC with gpt-5.4 - Unstoppable!"
理由: Mistranslation - 'CC' should be 'Claude Code' based on context. Also '强无敌' means 'incredibly powerful' not 'Unstoppable'. The reference shows 'Claude Code calling GPT-5.4 is incredibly powerful'. | Source implies "Claude Code calling gpt-5.4"; "Calling CC with gpt-5.4" reverses the caller/callee relationship slightly, though main intent (very powerful) is preserved. | Ambiguous phrasing. The source implies 'CC' (Claude Code) is utilizing or calling 'gpt-5.4'. The hypothesis sounds like an external agent is calling CC. Also, 'cx' in source likely refers to Codex/Claude, which the reference clarifies, while the hypothesis ignores the content entirely. |
| tencent/HY-MT1.5-1.8B | Translation of 'CC 调用 gpt-5.4 强无敌!' in en |
严重
[准确性]
"Translation of 'CC 调用 gpt-5.4 强无敌!' in en"
理由: Hypothesis is not a translation but a placeholder text indicating translation request | The hypothesis is a meta-description of the task rather than the actual translation. It fails to translate the source title or content. | Placeholder string; no actual translation provided for the sentence about Claude Code calling GPT-5.4 being very powerful. |
| tencent/HY-MT1.5-7B | "en": "Title: CC Calling GPT-5.4: Invincible!", "zh-TW": "标题:CC调用GPT-5.4:无敌!", "ja": "标题:CCがGPT-5.4を呼び出す:無敵!", "ko": "제목: CC가 GPT-5.4를 호출한다:무적!", "de":="Titel: CC ruft GPT-5.4 auf: Unbesiegbar!", "fr": "Titre : CC appelle GPT-5.4 : Invincible!", "it": "Titolo: CC chiama GPT-5.4: Invincibile!"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: The user is asking how to find previously registered free API sites. |
严重
[准确性]
"The user is asking how to find previously registered free API sites."
理由: The translation does not address the title as instructed and instead provides a content translation. | Missing title translation. Should include: 'I can't remember which free API proxy sites I've registered for, how can I find them?' | 忽略了“linux账号”和“佬友分享的公益站”等具体细节,将“公益站”泛化为“free API sites”,缺失代理/中转语境和帐号来源信息。 |
| CohereLabs/tiny-aya-water | Title: I don't remember which free API sites I registered. How do I find them? |
轻微
[准确性]
"free API sites"
理由: Omits the source content which provides crucial context: 'registered... using linux account' and 'shared by bros'. The hypothesis only translates the title. | The translation omits 'proxy' in 'free API proxy sites', which is less precise. | Source mentions公益站 that are specifically "free API proxy sites"; omission of "proxy" slightly under-specifies what the sites do. |
| Qwen/Qwen2.5-14B-Instruct | Title: Can't remember which free shared API sites I registered, how can I find them? Content: As the title says, I previously used my Linux account to register some free shared API sites that friends shared, but now I can't quite recall which ones I registered. |
轻微
[风格]
"friends"
理由: "佬友" is closer to bros/peer hackers; "friends" weakens the colloquial, in-group tone slightly. | Under-translated '佬友' which implies a specific community of experts/bros, not just generic friends. | Reference uses 'bros' which is the community-specific term |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: Can't remember which public proxies I registered with? How to find them?\nContent: As per the question, I registered some free shared API sites using my Linux account previously, but now I can't recall all of them and don't remember which ones I registered with.", "zh-TW": "標題:忘了註冊過哪些公益站?如何找到它们?\n內容:如題所述,我曾使用Linux帳號註冊了一些大神分享的公益站,但現在記不太清楚了,也不記得註冊了哪些", "ja": "タイトル:どの公益サイトに登録したか忘れてしまった、どうやって見つけるのか? 内容:質問の通り、以前Linuxアカウントでいくつかの大物さん提供の無料APIサイトに登録しましたが、今はどのサイトを登録したか思い出せません", "ko": "제목: 어떤 공익站在注册了吗?怎么找回它们? 내용: 질문대로, 이전에 Linux 계정으로 몇몇의大佬分享的免费共享API站点进行了注册,但现在记不清楚都注册了哪些", "de": "Titel: Kann ich mich nicht mehr daran erinnern, bei welchen Public Proxies ich registriert bin? Wie kann ich sie finden? Inhalt: Wie im Titel beschrieben, habe ich einige kostenlose geteilte API-Plattformen mit meinem Linux-Konto von einigen großartigen Freunden registriert, aber jetzt kann ich mich nicht mehr an alle erinnern und weiß nicht, welche ich registriert habe.", "fr": "Titre : Oublié les stations publiques auxquelles j'ai inscrit mon compte ? Comment les retrouver ? Contenu : Comme indiqué dans le titre, j'avais inscrit quelques API gratuites partagées avec mon compte Linux à l'aide d'amis généreux, mais maintenant je ne me souviens plus de tous et ne sais pas lesquelles j'ai inscrites.", "it": "Titolo: Non riesco più a ricordare quali API gratuite ho registrato? Come trovarle? Contenuto: Come nel titolo, ho registrato alcune API gratuite condivise utilizzando il mio account Linux da alcuni amici generosi, ma ora non mi ricordo tutti e non so quali ho registrato." } |
轻微
[准确性]
"public proxies"
理由: The translation is correct but could be more precise. | '公益站' in this context usually refers to free API proxy sites specifically, 'public proxies' is slightly generic but acceptable. However, the format error is the primary issue. | Source is '公益站' meaning free/public API proxy sites, not general 'public proxies'. |
| Qwen/Qwen3-14B | Ho dimenticato quali siti proxy gratuiti ho registrato, come posso trovarli? |
严重
[准确性]
"siti proxy gratuiti"
理由: Source refers to ‘公益站’ free API proxy sites; “proxy” is added but acceptable; however the API aspect is omitted, making it less precise. | Translation is in Italian instead of Chinese. Target language should be Chinese (zh). | The hypothesis only translates the title and omits the content explaining the use of a LINUX DO account and the specific forgetfulness about which sites were registered. |
| Qwen/Qwen3.5-4B | Title: Can't remember which free shared API sites I registered, how do I find them? Content: As mentioned in the title, I previously registered some public proxies shared by fellow users on Linux accounts, but now I can't quite recall which ones. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | I forgot which free shared API sites I registered with my Linux account; how can I find them? |
轻微
[准确性]
"free shared API sites"
理由: Source specifically refers to "公益站" shared by bros and implies API proxy sites; "free shared API sites" is close but drops "proxy" nuance from reference. | Slight imprecision; '公益站' in this context is often 'free API proxy sites' as per reference, though 'shared' is acceptable, 'proxy' is more precise technically. | Missing title context; should be 'I can't remember which free API proxy sites I've registered for, how can I find them?' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: I don't remember which public proxies I registered with before, how can I find out? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: I don't remember which free API sites I registered for, how can I find them? Content: As above, I registered for some free API sites shared by old friends using my Linux account before, but now I can't remember some of them and don’t remember which ones I registered for. |
严重
[准确性]
"Linux account"
理由: Mistranslation of 'linux账号'. The reference correctly identifies this as 'LINUX DO account', which is a specific platform name, not just a generic Linux account. | 'linux 账号' in this context likely refers to the 'LINUX DO' forum account, not a generic Linux OS account. | Source is "linux账号" but reference clarifies "LINUX DO account" (a specific service); translating as generic Linux account loses this specific meaning. |
| google/gemma-3-1b-it | As requested, here is the translation of the input string into English. |
致命
[准确性]
"As requested, here is the translation of the input string into English."
理由: Not a translation - this is a meta-statement. No actual translation provided. Should translate the title and content about finding registered free API proxy sites. | The model outputted a meta-statement instead of translating the source text. | The translation is incomplete and does not provide the actual content of the source text. | No actual translation is provided; it only states that a translation would be given, completely omitting the source meaning. |
| google/gemma-3-4b-it | I forgot which public API sites I registered with my Linux account. How can I find them? |
严重
[准确性]
"public API sites"
理由: '公益站' in this context usually refers to 'free proxy sites' or 'community-run free services', 'public API sites' is slightly off but acceptable. | Missing context; should specify 'free API proxy sites' and mention 'LINUX DO account' not just 'Linux account' | “公益站” in this context are free API proxy sites, not generic public API sites; important nuance about proxy/gateway is lost. |
| google/translategemma-12b-it | Title: I don't remember which free API sites I registered with, how can I find them? Content: As mentioned above, I used my Linux account to register with some free API sites shared by other users, but now I can't remember some of them and I don't remember which ones I registered with. |
轻微
[准确性]
"Linux account"
理由: Should be 'LINUX DO account' - this is a specific platform name, not a generic Linux account | The source says 'linux 账号' which in this specific community context likely refers to the 'LINUX DO' forum account (as per reference), not a generic Linux OS account. However, without more context, this is a minor nuance. | Source is "LINUX DO" account, a specific service; simplifying to generic Linux account is slightly misleading. |
| google/translategemma-4b-it | Title: I can't remember which public API sites I registered with, how do I find them? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 不记得自己注册过哪些公益站了,怎么找到?' in en |
严重
[准确性]
"Translation of 'Title: 不记得自己注册过哪些公益站了,怎么找到?' in en"
理由: Hypothesis is not a translation but a placeholder text indicating translation request | The hypothesis is a meta-description string, not a translation. It fails to convey the meaning of the source text regarding forgotten registrations. | Placeholder; does not give the actual meaning about not remembering which free API proxy sites were registered. |
| tencent/HY-MT1.5-7B | Title: I can't remember which public welfare sites I registered on, how can I find them? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | RT, feels like there's too many new things this year. No heart for work. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: What about 'Blow Water'? I'm having too much fun with the crabs, I have no heart to work, just want to play 🦞 |
严重
[准确性]
"'Blow Water'"
理由: "吹水" is forum slang for casual chat/chatter, not 'Blow Water'; literal rendering is misleading. | The translation uses slang ('Blow Water') instead of the proper term 'OpenClaw'. | Should be '[Chatter]' not '[Blow Water]'. '吹水' means chatter/casual discussion, not literal water blowing | Mistranslation of '吹水' (chatting/chatter). 'Blow Water' is a literal and incorrect translation of the Cantonese slang used here for casual talk.
严重
[准确性]
"crabs"
理由: Should be 'OpenClaw'. '龙虾' is a metaphor for OpenClaw, not literal crabs | Again, fails to translate the slang '小龙虾' (OpenClaw) correctly, rendering it as literal crabs. | "小龙虾" here is slang for OpenClaw, not actual crabs or shrimp; mistranslated the metaphor. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Chat] What to do, the little crayfish is so fun, can't focus on work at all, just want to play 🦞 Content: Retweet, there are too many new things this year. Can't focus on work. |
严重
[准确性]
"little crayfish"
理由: Failed to translate slang '小龙虾' (OpenClaw). Literal translation ruins the meaning. | Misses the slang metaphor. '小龙虾' refers to OpenClaw, not literal crayfish. Should be 'OpenClaw is too much fun' | "小龙虾" in this context is OpenClaw, not literal crayfish; conflicts with emoji usage referring to the tool. |
| Qwen/Qwen2.5-7B-Instruct | Title: 【吹水】What to do, the shrimp too fun, can't focus on work, just want to play🦞 Content: RT, feels like there are too many new things this year. Can't focus on work. |
严重
[准确性]
"the shrimp"
理由: The translation is accurate but could be more natural. | Failed to translate '小龙虾' (slang for OpenClaw) correctly. Literal 'shrimp' makes no sense in the context of 'playing' at work. | Should be 'OpenClaw', not literal 'shrimp'. The reference correctly identifies this as OpenClaw. | '小龙虾' is slang for OpenClaw, not literal shrimp.
严重
[流畅性]
"too fun"
理由: Grammatically incorrect. Missing verb 'is'. | Understandable but slightly unnatural; 'so much fun' would be more idiomatic. | Grammatically incorrect. Should be 'the shrimp is too fun' or better yet, recognize '龙虾' as OpenClaw slang. |
| Qwen/Qwen3-14B | 【Chiacchiere】Cosa devo fare? I gamberi sono troppo divertenti, non riesco a concentrarmi sul lavoro. Vorrei solo giocare al lavoro 🦞 |
严重
[准确性]
"I gamberi"
理由: The term 'gamberi' (crabs) is a mistranslation of '龙虾' (lobsters). | The hypothesis only translates the title and omits the content about too many new gadgets this year and lack of motivation. | “小龙虾” here refers to OpenClaw, not literal shrimp; translating literally changes meaning. | Translation is in Italian instead of Chinese. Target language is zh (Chinese), not Italian. |
| Qwen/Qwen3.5-4B | Title: [Chitchat] What to do? Crayfish are so fun, I can't focus on work at all. At work, I just want to play 🦞 Content: RT, there seem to be too many new things this year. Can't stand being at work. |
严重
[准确性]
"Crayfish are so fun"
理由: Should be 'OpenClaw is too much fun'; 'crayfish' is slang metaphor for OpenClaw | “小龙虾” is community slang for OpenClaw, not literal crayfish; shifts meaning from tool/service to an animal. | Failed to translate the slang '小龙虾' (little lobster/crayfish) which refers to 'OpenClaw'. The translation implies the user is playing with actual animals. |
| Qwen/Qwen3.5-9B | Title: [Chatter] What should I do? Crayfish are so fun, can't focus on work, only want to play at work 🦞 Content: RT, feel like there are too many new things this year. Can't focus on work. |
严重
[准确性]
"Crayfish are so fun"
理由: Should be 'OpenClaw is too much fun' not 'Crayfish are so fun'; misses slang meaning | Literal translation of slang '小龙虾'. Refers to 'OpenClaw', not the animal. | "小龙虾" is slang for OpenClaw, not actual crayfish. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Translation of '小龙虾' in en |
致命
[准确性]
"Translation of '小龙虾' in en"
理由: Hypothesis is incomplete - only provides a generic description instead of actual translation content | The hypothesis is not a translation. It is a meta-comment describing a translation task. It fails to translate either the title or the content. | Does not provide an actual translation; just a meta description, failing to translate the title/content at all. | The hypothesis is not a translation at all - it's a meta-description. It should provide an actual English translation of the title, which should reference 'OpenClaw' not just describe what needs to be translated. |
| google/gemma-3-12b-it | Title: [Chatter] What to do, the crayfish are too fun, I can’t focus on work at all, I just want to play 🦞 Content: Seriously, there are so many new things this year. Can’t focus on work. |
致命
[准确性]
"the crayfish are too fun"
理由: Misses slang meaning. '龙虾' refers to OpenClaw, not crayfish. Should be 'OpenClaw is too much fun'. | "小龙虾" here is community slang for OpenClaw, not literal crayfish; this mistranslates the main subject. | Failed to identify '小龙虾' as slang for 'OpenClaw'. Translated literally as the animal. |
| google/gemma-3-1b-it | What should I do? |
轻微
[准确性]
"What should I do?"
理由: Severely incomplete - only translates a fragment of the title. Missing the full title context about OpenClaw being too fun and the entire content section. | Severe under-translation. The hypothesis only translates the first phrase '怎么办啊' and completely omits the core message about 'OpenClaw' (小龙虾) being too fun and affecting work. | Only translates a small fragment of the title and omits key information about OpenClaw being too fun, lack of desire to work, and the context of new gadgets. | The translation is too vague and does not fully capture the source's context about OpenClaw being too fun. |
| google/gemma-3-4b-it | Title: [Playing Around] What to do? Little lobsters are so fun, I have no mind for work, and I just want to play 🦞 Content: RT, feels like there are too many new things this year. No desire to go to work. |
严重
[准确性]
"Little lobsters are so fun"
理由: “小龙虾” here is community slang for OpenClaw, not actual lobsters; literal translation misleads. | Misses slang context; should be '[Chatter] What should I do, OpenClaw is too much fun, I have no desire to work and just want to play with 🦞 at the office' | Failed to translate slang '小龙虾' as 'OpenClaw'. Literal translation makes the sentence nonsensical in a tech context. |
| google/translategemma-12b-it | Title: [Chat] What should I do? The mini lobsters are so fun that I can't focus on work at all; I just want to play 🦞 Content: As mentioned above, there are too many new things this year. I have no motivation to go to work. |
严重
[术语]
"mini lobsters"
理由: "小龙虾" is slang for OpenClaw, not literal small lobsters; this changes the meaning from a software/toy to animals. | Should be 'OpenClaw' instead of 'mini lobsters'. | Incorrect terminology: 'mini lobsters' should be 'OpenClaw' - in this context, 小龙虾 is slang for OpenClaw (AI coding tool), not actual lobster |
| google/translategemma-4b-it | Title: [Fun] What should I do? These small lobsters are too addictive, making me lose interest in work and just want to play 🦞 |
严重
[准确性]
"These small lobsters are too addictive"
理由: Mistranslation - '小龙虾' is slang for OpenClaw, not literal 'small lobsters'. Should be 'OpenClaw is too much fun' not 'small lobsters are too addictive'. | Literal translation of the slang '小龙虾'. It refers to the software 'OpenClaw', not the crustacean. 'OpenClaw' should be used. | The translation incorrectly translates 'OpenClaw' as 'lobsters'. | "小龙虾" is a community nickname for OpenClaw, not literal lobsters; the translation loses the software/tool reference. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: The hypothesis is 'Login', which is completely unrelated to the source text about playing with 'OpenClaw' (lobster) and lacking motivation to work. | Hypothesis is completely unrelated to source; source is about OpenClaw being fun and distracting from work, hypothesis is 'Login' | 'Login' does not translate the title about OpenClaw being too fun and affecting work motivation. | The translation is completely incorrect and misrepresents the content. |
| tencent/HY-MT1.5-7B | "en": "Title: [Rumors] What should I do? The crayfish are so much fun that I can’t focus on work at all. All I want to do is play while at work 🦞", "zh-TW": "Title: [谣言] 我该怎么办?小龙虾太有趣了,我完全无法专心工作,上班时只想玩🦞", "ja": "Title: [噂] どうすればいいのでしょうか?小龍虾がとても楽しくて、仕事に集中できません。仕事中は遊びたくなるだけです🦞", "ko": "Title: [소문] 어떻게 해야 할까요?작은 게가 너무 재미있어서 일에 전혀 집중할 수 없습니다. 출근하면 그저 놀고 싶을 뿐입니다🦞", "de":="Titel: [Gerüchte] Was soll ich tun? Die kleinen Krebsse sind so unterhaltsam, dass ich mich nicht auf die Arbeit konzentrieren kann. Alles, was ich will, ist während der Arbeit zu spielen🦞", "fr":="Titre: [Rumeurs] Que faire ? Les petits crevettes sont tellement amusantes que je ne peux pas me concentrer sur mon travail. Tout ce que je veux, c’est jouer pendant le travail🦞", "it":="Titolo: [Voci] Cosa devo fare? I gamberetti sono così divertenti che non riesco a concentrarmi sul lavoro. Tutto quello che voglio è giocare mentre lavoro🦞"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: CC Switch Self-Improvement Version Share |
严重
[准确性]
"Self-Improvement Version"
理由: Should be 'Sharing my modified/enhanced version of CC-Switch for personal use'. The phrasing misses the personal/self-use aspect. | “自用”意为“自己使用”,而非“自我提升”;应译为“for personal use”或类似表达。 | The hypothesis omits the entire content text explaining the benefits of AI coding and the reason for sharing. |
| CohereLabs/tiny-aya-water | Title: CC Switch Self-Improvement Edition Share | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: CC Switch Personal Enhanced Version Share Content: Benefits of AI coding, add desired features anytime Once modified, share it out in case some friends need it too |
轻微
[流畅性]
"Benefits of AI coding, add desired features anytime Once modified, share it out in case some friends need it too"
理由: Sentence fragments and lack of proper punctuation make the text unreadable. | Reference uses more natural phrasing 'The advantage of AI coding is that you can add whatever features you want at any time' | Formatting is broken and unnatural. Should be properly structured as title and content separately | Run-on sentence lacking proper punctuation; slightly hurts readability but meaning is clear. |
| Qwen/Qwen2.5-7B-Instruct | Title: CC Switch Custom Enhanced Edition Share Content: Enjoy the benefits of AI coding Add any features you want Whenever you fix something, share it out Just in case someone else might need it too | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Titolo: Condivisione della versione migliorata personale di CC Switch | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: CC Switch Personal Enhanced Edition Share Content: The benefits of AI coding - add any feature you want whenever. Since it's been modified, I'm sharing it out. Maybe some friends will need it too. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: CC Switch Self-Use Enhanced Edition Share Content: The benefits of AI coding; add any desired features anytime. Since I've made the modifications, I'm sharing it just in case other users might need it too. |
轻微
[风格]
"other users"
理由: Should be 'other bros need it too' to match the informal tone | Source uses familiar slang "佬友" (bros); "users" loses the colloquial tone but meaning remains. | Loss of community tone. '佬友' implies 'bros' or 'fellow members', not generic 'users'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: CC Enhanced Private Share |
严重
[准确性]
"CC Enhanced Private Share"
理由: The hypothesis is too abbreviated and loses meaning. Should be 'Sharing my modified/enhanced version of CC-Switch for personal use' to match the reference and convey the full context. | Omits ‘Switch’ (CC Switch) and idea of ‘personal use enhanced version’ and ‘sharing’; title is incomplete relative to source/reference content. | The hypothesis only translates the title and completely omits the content text ('AI coding 的好处想要什么功能随时加上...'). | The translation should specify 'CC-Switch'. |
| google/gemma-3-12b-it | Title: CC Switch Enhanced Personal Use Edition Sharing Content: Benefits of AI coding - add any features you want - since I've modified it, I'll release it - just in case other users need it too. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of 'CC Switch 自用增强版 分享' in en |
致命
[准确性]
"Translation of 'CC Switch 自用增强版 分享' in en"
理由: The model outputted a meta-statement instead of performing the translation. | Meta-text rather than a translation; does not convey the actual meaning (“Sharing my enhanced version of CC Switch for personal use”). | The translation is incomplete and does not provide the actual content of the source text. | Translation is a generic placeholder 'Translation of...' instead of actual translation. Content section is completely omitted | Not a translation - this is a meta-description. Should be 'Sharing my modified/enhanced version of CC-Switch for personal use'. Missing content section. |
| google/gemma-3-4b-it | Title: CC Switch Self-Use Enhancement Edition Sharing Content: What features do you want for AI coding? Add them as needed. Since the modification is done, let’s share it. Just in case some seniors also need it. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: CC Switch - Enhanced Version for Personal Use, Sharing Content: Benefits of AI coding, add any features you want at any time. Since I've made some improvements, I'm releasing it. In case anyone else needs it. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: CC Switch Enhanced Version for Personal Use - Sharing Content: I want to add any desired features for AI coding. Since the modification is complete, share it. In case anyone else needs it. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'CC Switch 自用增强版 分享' in en |
轻微
[准确性]
"Translation of 'CC Switch 自用增强版 分享' in en"
理由: Placeholder; missing actual translation about sharing a personally enhanced version of CC Switch. | The hypothesis is a meta-description of the translation task, not the translation itself. It fails to translate the source title or content. | Hypothesis is incomplete and does not provide an actual English translation. It only states 'Translation of [Chinese text] in en' without delivering the translated content. Reference provides: 'Sharing my modified/enhanced version of CC-Switch for personal use' | Hypothesis is not a translation but a placeholder text indicating translation request |
| tencent/HY-MT1.5-7B | "en": "Title: CC Switch Enhanced Version for Personal Use - Sharing", "zh-TW": "标题:CC Switch个人增强版分享", "ja": "标题:CC Switch個人強化版分享", "ko": "标题:CC Switch個人強化版分享", "de":="Titel: CC Switch verbesserte Version für den persönlichen Gebrauch – Sharing", "fr": "Titre : CC Switch version améliorée pour l’usage personnel – Partage", "it": "Titolo: CC Switch versione migliorata per uso personale – Condivisione"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Share a free watermark remover tool (Can remove banana watermarks) | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Share a free watermark remover tool (Can remove banana watermarks) |
轻微
[准确性]
"banana watermarks"
理由: Should be 'Banana Pro watermarks'. Reference specifies 'Banana Pro', not just 'banana' | The translation omits 'Pro' in 'Banana Pro', which is less precise. | Mistranslation of '大香蕉'. In this context (software tools), 'Big Banana' likely refers to a specific app or brand (e.g., Banana Pro), not the fruit. The reference uses 'Banana Pro'. Literal translation loses the specific entity reference. | The source implies removing watermarks including from "大香蕉" (a specific service, Banana Pro); literal "banana" loses the proper-noun nuance but overall function remains clear. |
| Qwen/Qwen2.5-14B-Instruct | Title: Share a free watermark remover tool (can remove Big Banana watermark) Content: No account registration required, can remove watermarks from 5 images per day, quota resets daily |
严重
[准确性]
"Big Banana watermark"
理由: Original refers to "大香蕉" which is a product/brand (Banana Pro); "Big Banana" risks being taken literally or differently branded. | Should be 'Banana Pro watermarks' not 'Big Banana'. Reference correctly identifies the product | Failed to recognize '大香蕉' as a specific product name (Banana Pro), translating it literally. |
| Qwen/Qwen2.5-7B-Instruct | Title: Share a Free Watermark Removal Tool (Can Remove BigBanana Watermarks) Content: No account registration required, remove watermarks from 5 images daily, quota refreshed daily | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Condividi uno strumento gratuito per rimuovere i filigrana (può rimuovere il filigrana LinuxDo) |
轻微
[准确性]
"(può rimuovere il filigrana LinuxDo)"
理由: Mistranslation of '大香蕉' (Big Banana). The source refers to 'Big Banana' watermarks, but the hypothesis incorrectly translates it as 'LinuxDo' watermarks. | Source specifies Banana Pro watermarks, not LinuxDo; brand is mistranslated. | The term 'filigrana LinuxDo' is a mistranslation of '大香蕉浮水印'. | Translation is in Italian instead of Chinese. Target language is zh (Chinese), not Italian. |
| Qwen/Qwen3.5-4B | Title: Share a free watermark remover tool (removes Big Banana watermarks) Content: No registration required, up to 5 images per day can have their watermarks removed, quota updates daily. |
严重
[准确性]
"(removes Big Banana watermarks)"
理由: Incorrect name: 'Big Banana' should be 'Banana Pro' as in reference. | Should be 'Banana Pro watermarks'; 'Big Banana' is incorrect translation of the tool name | Failed to recognize '大香蕉' (Big Banana) as a specific product name or slang (likely 'Banana Pro' as per reference). While 'Big Banana' is a literal translation, it might miss the specific brand reference intended. | “大香蕉” in this context refers to Banana Pro branding; “Big Banana” is a literal but incorrect product rendering compared with “Banana Pro” in the reference. |
| Qwen/Qwen3.5-9B | "en": "Title: Share a free watermark removal tool (can remove Big Banana watermarks)\nContent: No account registration required; up to 5 images can be removed daily, quota resets every day", "zh-TW": "標題:分享一款免費的水印去除工具(可移除大香蕉水印)\n內容:無需註冊帳號,每天可去除5張圖片的水印,額度每日重置", "ja": "タイトル:無料のウォーターマーク削除ツールの共有(「大バナナ」ウォーターマークも削除可能)\nコンテンツ:アカウント登録不要。毎日5枚の画像まで削除可能、クォータは毎日リセットされます", "ko": "제목: 무료 워터마크 제거 도구 공유 (대바나나 워터마크 제거 가능)\n내용: 회원가입 불필요. 매일 최대 5 장의 이미지 제거 가능, 할당량은 매일 초기화됩니다", "de": "Titel: Teilen Sie ein kostenloses Wasserzeichen-Entfernungs-Tool (kann „Big Banana"-Wasserzeichen entfernen)\nInhalt: Keine Registrierung erforderlich; bis zu 5 Bilder können täglich entfernt werden, Kontingent wird täglich zurückgesetzt", "fr": "Titre : Partager un outil gratuit de suppression de filigrane (peut supprimer les filigranes « Big Banana »)\nContenu : Aucun compte à enregistrer ; jusqu'à 5 images peuvent être supprimées par jour, le quota est réinitialisé quotidiennement", "it": "Titolo: Condividi uno strumento gratuito per la rimozione del marchio d'acqua (può rimuovere i marchi d'acqua \"Big Banana\")\nContenuto: Non è necessaria alcuna registrazione; fino a 5 immagini possono essere rimosse ogni giorno, il limite viene resettato quotidianamente" } |
严重
[准确性]
"Big Banana watermarks"
理由: Should be 'Banana Pro watermarks' not 'Big Banana watermarks' | Source says it can remove "大香蕉" watermarks, which contextually refers to "Banana Pro"; "Big Banana" deviates from the branded term. | Mistranslation of slang '大香蕉'. Reference indicates this likely refers to 'Banana Pro' (a specific service/model), not a literal fruit. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Share a Free Watermark Removal Tool (Can Remove Big Banana Watermarks) |
严重
[准确性]
"Big Banana"
理由: The hypothesis only translates the title and omits the content text ('不须注册帐号,每天可移除 5 张图片的浮水印...'). | “大香蕉” is a specific watermark/brand (Banana Pro); “Big Banana” changes the proper name and introduces unintended connotations. | The hypothesis translates '大香蕉' as 'Big Banana' but should be 'Banana Pro'. '大香蕉' is a reference to a specific service/brand, not literal bananas.
轻微
[术语]
"Big Banana"
理由: The term '大香蕉' should be translated as 'Banana Pro'. | Incorrect translation - '大香蕉' is a nickname/slang for 'Banana Pro', not literal 'Big Banana'. Should be 'Banana Pro' | '大香蕉' is likely a specific brand or slang (possibly 'Banana Pro' as in reference). 'Big Banana' is a literal translation that might miss the specific product name nuance, though acceptable depending on context. However, the main issue is missing content. |
| google/gemma-3-12b-it | Title: Share a free watermarks remover tool (can remove DaXiangJiao watermark) Content: No registration required, you can remove watermarks from 5 images per day, and the quota is updated daily. |
严重
[术语]
"DaXiangJiao watermark"
理由: Keeps pinyin instead of using an understandable English brand/term such as "Banana Pro"; may hinder clarity for non-Chinese readers. | Should be 'Banana Pro watermarks' not 'DaXiangJiao watermark'. DaXiangJiao is the Chinese name but the reference uses the English brand name. | '大香蕉' is a specific brand/tool name (Banana Pro). Transliterating as 'DaXiangJiao' loses the recognized brand identity compared to 'Banana Pro'. |
| google/gemma-3-1b-it | Remove a large banana watermark for free daily |
严重
[准确性]
"Remove a large banana watermark for free daily"
理由: Misinterprets “大香蕉” as literal large banana instead of Banana Pro branding; also omits that it is a tool and that it can process 5 images per day with quota reset. | Misinterprets the source. The source says the tool *can* remove 'Big Banana' (a specific software) watermarks. The hypothesis sounds like an instruction or a claim that it *only* removes banana watermarks daily, missing the '5 images' limit and the tool sharing context. | Incomplete and inaccurate - only partial translation of title. Should be 'Sharing a free watermark removal tool (can remove Banana Pro watermarks)'. Missing content section entirely. |
| google/gemma-3-4b-it | Share a free watermark removal tool (can remove Big Banana watermark) |
严重
[准确性]
"Big Banana watermark"
理由: “大香蕉” in this context refers to “Banana Pro” branding, not literally “Big Banana”; product/brand name mistranslated. | Missing content translation entirely; only title provided | Mistranslation of '大香蕉'. In this context, it likely refers to a specific brand or tool name (e.g., 'Banana Pro'), not a literal big banana fruit. |
| google/translategemma-12b-it | Title: Share a free watermark removal tool (removes Big Banana watermarks) Content: No account registration required, you can remove watermarks from up to 5 images per day, and the quota is updated daily. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Share a free watermark removal tool (can remove "Big Banana" watermark) | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '分享一個免費浮水印去除工具 (可移除大香蕉浮水印)' in en |
严重
[准确性]
"Translation of '分享一個免費浮水印去除工具 (可移除大香蕉浮水印)' in en"
理由: Hypothesis is not a translation but a placeholder text indicating translation request | Placeholder; lacks concrete translation about sharing a free watermark removal tool for Banana Pro watermarks. | The hypothesis is a meta-description string, not the actual translation of the source text about a watermark removal tool. |
| tencent/HY-MT1.5-7B | Title: Share a free watermark removal tool (removes large banana watermarks) |
严重
[准确性]
"large banana watermarks"
理由: Literal translation of slang '大香蕉'. This likely refers to a specific software or service named 'Banana' (e.g., Banana Pro), not the fruit. The reference clarifies it as 'Banana Pro'. | "大香蕉" in this context refers to the Banana Pro brand, not a literal large banana; brand meaning is lost. | Source uses '大香蕉' (big banana) as slang for Banana Pro watermark service, but hypothesis translates it literally as 'large banana watermarks' instead of 'Banana Pro watermarks' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: Please ask the big guys about the API key of openclaw. They might be able to help. |
严重
[准确性]
"Please ask the big guys about the API key of openclaw. They might be able to help."
理由: 将原文的请求推荐“api渠道 / 好用的api key”简化为向大佬咨询 API key,遗漏“第三方渠道很贵”“接入 openclaw 使用”等关键信息,语义不完整。 | The hypothesis changes the speech act from a direct plea/request ('Begging the experts...') to a suggestion ('Please ask... They might be able to help'). It also omits the title and the context about third-party channels being expensive. | Missing title translation entirely. Should include: 'Besides Wong's free API proxy site, can any bro recommend an API channel?' | The translation does not address the title as instructed and instead provides a content translation. |
| CohereLabs/tiny-aya-water | Request: Please introduce an API channel that is free and easy to use. Also, tell me how to get an API key from openclaw. |
轻微
[准确性]
"Request: Please introduce an API channel that is free and easy to use. Also, tell me how to get an API key from openclaw."
理由: Changes title to a request sentence and adds "free" and "easy to use" which are not explicit; also shifts from asking for API keys to asking how to get a key, slightly altering focus. | The translation omits context about avoiding third-party channels. | Hallucination/Inaccuracy. The source asks for an API channel to connect *to* OpenClaw because third-party channels are expensive. The hypothesis says 'get an API key *from* openclaw', which reverses the relationship. OpenClaw is the client/tool needing an API provider. | Missing title translation entirely. Should include title: 'Besides Wong's free API proxy site, can any bro recommend an API channel? Third-party channels are too expensive. I want to connect it to OpenClaw.' |
| Qwen/Qwen2.5-14B-Instruct | Title: Wong Elder's Free Shared Site Is there an elder who can introduce an API channel? Third-party channels are too expensive. Integrate with OpenClaw for use Content: Begging for help from elders, please recommend a good API key for OpenClaw |
轻微
[风格]
"elders"
理由: Again "elders" is an odd rendering of "大佬" (experts/bros), slightly off register. | Overly literal translation of '跪求大佬'. | Reference uses 'Begging the experts' - slightly different tone |
| Qwen/Qwen2.5-7B-Instruct | Title: Wonglao Public API Site Are there any big shots? Introduce an API channel Third-party channels are too expensive Use OpenClaw for access Content: Kneeling request for big shots OpenClaw good API key introduction needed |
严重
[流畅性]
"Wonglao Public API Site"
理由: Should be 'Wong's public/free API site'; 'Wonglao' is an awkward transliteration of 'Wong佬'. | The title is a run-on string of keywords without proper punctuation or grammatical connection, making it hard to read. | Incoherent translation. Should be 'Besides Wong's free API proxy site, can any bro recommend an API channel?' |
| Qwen/Qwen3-14B | Title: Wong's Free Shared API Site Is there any master who can introduce an API channel? Third-party channels are too expensive. Integrate with OpenClaw usage Content: Please help me, masters. I'm begging for a good OpenClaw API key introduction. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Wong's Free Shared API Site - Any experts? Recommend an API channel. Third-party channels are too expensive. Using OpenClaw integration. Content: Begging for help! Seeking good OpenClaw API keys. Please recommend. |
轻微
[准确性]
"Wong's Free Shared API Site"
理由: Source asks “Wong佬公益站 有没有佬 介绍一个api渠道” meaning “Aside from Wong’s public site, can anyone recommend...”; hypothesis makes it sound like introducing Wong’s site itself and omits the “besides” nuance. | Should be 'Wong's free API proxy site'; terminology inconsistency | Incorrect: The title implies Wong's site is being discussed, but reference says 'Besides Wong's free API proxy site' - meaning looking for alternatives to Wong's. | The source 'Wong 佬公益站' implies 'Wong's free station'. The translation adds 'API Site' which is inferred but acceptable. However, the structure 'Any experts? Recommend...' is a bit fragmented. |
| Qwen/Qwen3.5-9B | Title: Wong's Free Shared API Site - Does anyone have recommendations for an API channel? Third-party channels are expensive; I need one compatible with OpenClaw. Content: Begging experts for a good OpenClaw-compatible API key recommendation. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Wong佬公益站 有没有佬 介绍一个api渠道 三方渠道好贵 接入openclaw使用 Content: 值得一提的是,Wong佬的公益站提供了免费的API访问,这对于需要接入OpenClaw系统的开发者来说非常实用。 请问您是否需要进一步了解如何获取这些API密钥? |
致命
[准确性]
"Title: Wong佬公益站 有没有佬 介绍一个api渠道 三方渠道好贵 接入openclaw使用"
理由: Simply copies the Chinese source as ‘translation’ for a zh target-language task that actually wants an English translation; no translation performed. | Hypothesis is mostly unchanged from source but adds extra content not in source ('值得一提的是...' and '请问您是否需要进一步了解...') - this is hallucination/addition | The hypothesis fails to translate the title, leaving it in the source language (Chinese). | The translation should provide a proper English version.
致命
[准确性]
"Content: 值得一提的是,Wong佬的公益站提供了免费的API访问,这对于需要接入OpenClaw系统的开发者来说非常实用。
请问您是否需要进一步了解如何获取这些API密钥?"
理由: The hypothesis content is a hallucination. It invents a polite response and information about Wong's site instead of translating the source content ('跪求大佬 openclaw 好用的 api key 求介绍'). | Adds ungrounded information about Wong’s site providing free API access and asks questions not in the source; original request for recommendations is lost. | Entire Content section is fabricated/hallucinated - not present in source at all |
| google/gemma-3-12b-it | Title: Looking for a good public proxy (free API site). Third-party channels are too expensive; need an API channel for use with OpenClaw. |
严重
[准确性]
"good public proxy (free API site)"
理由: The hypothesis omits the specific reference to 'Wong 佬' (Brother Wong) and combines the title and content confusingly. It misses the nuance of asking for a recommendation *besides* Wong's site. | Original mentions "Wong佬公益站" then asks for another channel; hypothesis omits the explicit reference to Wong’s site and slightly shifts the nuance from "besides Wong's" to a general search. | Missing context. Should reference 'Wong佬公益站' (Wong's free API proxy site) as mentioned in the source. The hypothesis doesn't capture the specific reference. |
| google/gemma-3-1b-it | Translation of 'Wong佬公益站 有没有佬 介绍一个api渠道 三方渠道好贵 接入openclaw 使用' in en |
致命
[准确性]
"Translation of 'Wong佬公益站 有没有佬 介绍一个api渠道 三方渠道好贵 接入openclaw 使用' in en"
理由: Not a translation - this is a meta-description. Should provide actual translation of title and content about finding API channels for OpenClaw. | The model outputted a meta-statement instead of performing the translation. | The translation is incomplete and does not provide the actual content of the source text. | Translation is a generic placeholder 'Translation of...' instead of actual translation. Content section is completely omitted | Pure meta-text; no actual translation of the request for an API channel besides Wong’s free proxy, to be used with OpenClaw. |
| google/gemma-3-4b-it | Title: Wong Lau Public Proxy Does anyone have a bro who can introduce an API channel? Three-party channels are expensive. Access openclaw. |
轻微
[术语]
"Three-party channels"
理由: Should be 'Third-party channels' not 'Three-party channels' | Reference says 'Third-party channels' - '三方' means third-party, not three-party | Standard term is “third-party channels”; “three-party” is non-idiomatic. |
| google/translategemma-12b-it | Title: Looking for recommendations for 'Wong佬' public API sites. Third-party channels are too expensive; I need an API channel for use with OpenClaw. |
严重
[准确性]
"Looking for recommendations for 'Wong佬' public API sites."
理由: The hypothesis only translates the title and omits the 'Content' section ('跪求大佬 openclaw 好用的 api key 求介绍'). | The hypothesis says 'Looking for recommendations for Wong佬 public API sites' when the reference clarifies it's 'Besides Wong's free API proxy site', indicating Wong佬 is a specific person/site being referenced. | Misreads the title: source contrasts Wong's public site with other API channels; here it sounds like looking for sites belonging to Wong instead of alternatives to Wong's site. | Incorrect translation: 'Wong佬' should be 'Wong's' - 佬 is a suffix meaning 'dude/bro', so 'Wong佬' refers to a person named Wong |
| google/translategemma-4b-it | Title: Wong's Free Shared API Site - Does anyone have an API channel recommendation? Three-party channels are expensive. Using Openclaw. |
严重
[准确性]
"Wong's Free Shared API Site - Does anyone have an API channel recommendation?"
理由: Mistranslation - '三方渠道' means 'third-party channels' not 'Three-party channels'. Also the title structure is confusing - should be 'Besides Wong's free API proxy site, can any bro recommend an API channel?' | Implies Wong owns the free site and treats the rest as a subtitle; source asks "Besides Wong's free site, can anyone recommend..."; relationship and contrast are lost. | The source 'Wong 佬公益站' is the topic, and the user is asking for *another* channel because third-party ones are expensive. The hypothesis phrasing makes it sound like the title is defining Wong's site, whereas the reference clarifies the user is looking for alternatives *besides* or *like* Wong's. |
| tencent/HY-MT1.5-1.8B | Title: Wong Lao Public Welfare Station: Are There Any Lao People Who Can Introduce an API Channel? The Third-Party Channels Are Too Expensive. Access OpenClaw for Use. |
严重
[准确性]
"Title: Wong Lao Public Welfare Station: Are There Any Lao People Who Can Introduce an API Channel? The Third-Party Channels Are Too Expensive. Access OpenClaw for Use."
理由: Mistranslation of '佬' (slang for 'bros' or 'experts' in this community context) as 'Lao People' (ethnic group). This changes the meaning significantly and sounds unnatural. | 'Wong佬公益站' is misinterpreted; should mean "Besides Wong's free API site" rather than a literal "Wong Lao Public Welfare Station" and "Lao people". Also misses nuance of asking bros, and phrasing is awkward. | Awkward and inaccurate translation. Should be 'can any bro recommend' not 'Are There Any Lao People Who Can Introduce'. | Hypothesis is English but target language is zh; also contains mistranslation - 'Wong Lao' should be 'Wong's', 'Lao People' should be 'bros', and content doesn't match reference |
| tencent/HY-MT1.5-7B | "en": "Title: Does anyone know about the Wong佬 Public Welfare Site? Can someone recommend an API channel? Third-party channels are too expensive. I want to use it with OpenClaw.", "zh-TW": "标题:有人知道Wong佬公益站吗?能推荐一个API通道吗?第三方渠道太贵了,我想用它搭配OpenClaw使用。", "ja": "タイトル:Wong佬公益站について知っている方はいますか?APIチャネルをお勧めいただけますか?第三者チャネルは高額すぎます。OpenClawと一緒に使用したいです。", "ko": "제목: Wong佬 공익사이트에 대해 아는 분 계신가요? API 채널을 추천해 주실 수 있나요? 제3자 채널은 너무 비쌉니다. OpenClaw와 함께 사용하고 싶습니다.", "de":="Titel: Kennt jemand die Wong佬 Public Welfare Site? Können Sie einen API-Channel empfehlen? Drittanbieter-Channels sind zu teuer. Ich möchte sie mit OpenClaw verwenden.", "fr":="Titre : Connaît-on la Wong佬 Public Welfare Site ? Quelqu’un peut-il recommander un canal API ? Les canaux tiers sont trop chers. Je souhaite les utiliser avec OpenClaw.", "it":="Titolo: Qualcuno conosce la Wong佬 Public Welfare Site? Potete consigliarmi un canale API? I canali di terze parti sono troppo costosi. Vorrei usarli con OpenClaw."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Re-draw lottery bar gemini US family group slot/year*1 | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Re-draw lottery bar gemini US family group slot/year*1 |
轻微
[流畅性]
"Re-draw lottery bar"
理由: "bar" is an incorrect carryover from Chinese suffix "吧"; should be omitted or rendered as "Let's redraw the lottery". | The translation uses 'Re-draw lottery bar' which is awkward. | Awkward phrasing. Should be 'Let's restart the draw' or 'Restarting the lottery' | Unnecessary addition: 'bar' is not in the source - '抽奖' means 'draw/lottery' | 'lottery bar' is a mistranslation of '吧' (particle indicating suggestion or topic, or 'let's'). It should be 'Let's re-draw' or similar. 'bar' makes no sense here. |
| Qwen/Qwen2.5-14B-Instruct | Title: Re-draw for a shared Gemini US family group subscription slot/year *1 Content: Draw: Gemini Pro US family group *1 (no warranty) I will add you to the family group You need your own US account To be added to the family group |
轻微
[流畅性]
"Draw: Gemini Pro US family group *1"
理由: Missing article and "Plan"; "Gemini Pro US Family Plan *1" would match reference and be more natural. | Formatting is broken with run-on sentences. Should be properly punctuated and structured | Reference says 'Lucky Draw: Gemini Pro US Family Plan' - more natural phrasing |
| Qwen/Qwen2.5-7B-Instruct | Title: Try drawing lots again for gemini US family group slot/year *1 Content: Drawing: gemini pro US family group *1(no warranty) I will add you to the family group, you need to prepare a US account yourself to join the family group | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: Try drawing again gemini US family group slot/year*1 Content: Draw: gemini pro US family group*1 (no warranty) I will add you to the family group, need to prepare your own US account, add you to the family group | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Re-draw! gemini US Family Group shared subscription slot/year *1 Content: Draw: gemini pro US Family Group *1 (no warranty) I'll add you to the family group. Need your own US account. Add you to the family group. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Try your luck again! Gemini US Family Slot/Year *1 Content: Lucky Draw: 1x Gemini Pro US Family Subscription (No Warranty). I'll invite you to the family group. Please prepare your own US account. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Re-draw the lottery, Gemini Pro family group car seat/year *1 Content: Lottery draw: Gemini Pro US family group *1 (no warranty) I'll invite you to join the family group. You need to have your own US account to join the family group. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Raffle: Gemini Pro US Family Group *1 (No Warranty) I’ll add you to the family group - requires your own US account. Adding you to the family group. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Lucky draw: Gemini Pro family car space/year *1 | 通过 (无共识错误) |
| google/gemma-3-4b-it | "en": "Re-roll Gemini US Family Plan Slot/Year *1", "zh-TW": "重新抽獎 車位/年\*1", "ja": "ジェミニUSファミリープランの枠/年\*1を再抽選", "ko": "제미니 미국 가족 플랜 슬롯/년 \*1 재추첨", "de": "Gemini US Familienplan Slot/Jahr *1 neu ziehen", "fr": "Refaire le tirage pour une place Gemini US Familiale/Année *1", "it": "Ricarica il concorso per un posto Gemini USA Familiare/Anno *1" } <|im_end|> | 通过 (无共识错误) |
| google/translategemma-12b-it | Raffle: Gemini Pro US Family Plan *1 (no warranty). I'll add you to the family plan; you need your own US account. I'll add you to the family plan. | 通过 (无共识错误) |
| google/translategemma-4b-it | "en": "Title: Re-enter the lottery gemini US family group slot/year *1\nContent: Lottery: gemini pro US family group *1 (no guarantee) I invite you to the family group Need to have your own US account Invite you to the family group", "zh-TW": "標題:重新抽獎 gemini 美國家庭組 車位/年\*1\n內容:抽獎:gemini pro 美國家庭組\*1(無保證) 我邀請您加入家庭組 需要自行準備美國帳號 邀請您加入家庭組", "ja": "タイトル:再度抽選 gemini 米国ファミリーグループ スロット/年 \*1\nコンテンツ:抽選:gemini pro 米国ファミリーグループ \*1(保証なし) 私があなたをファミリーグループに招待 独自の米国のアカウントをご用意する必要があります ファミリーグループへの招待", "ko": "제목: 다시 로또 참여 gemini 미국 가족 그룹 슬롯/년 \*1\n내용: 로또: gemini pro 미국 가족 그룹 \*1 (보증 없음) 저는 당신을 가족 그룹에 초대 필요: 자체 미국 계정을 준비해야 합니다 가족 그룹 초대", "de": "Titel: Neu starten Sie den Wettbewerb gemini US Familien-Gruppe Slot/Jahr \*1\nInhalt: Wettbewerb: gemini pro US Familien-Gruppe \*1 (keine Garantie) Ich lade Sie zur Familien-Gruppe ein Benötigen Sie Ihr eigenes US-Konto Einladung zur Familien-Gruppe", "fr": "Titre : Réinscrivez-vous au tirage au sort gemini groupe familial américain Slot/Année \*1\nContenu : Tirage au sort : gemini pro groupe familial américain \*1 (sans garantie) Je vous invite à rejoindre le groupe familial Vous devez avoir votre propre compte américain Invitation au groupe familial", "it": "Titolo: Riscrivetevi al concorso gemini gruppo familiare statunitense Slot/Anno \*1\nContenuto: Concorso: gemini pro gruppo familiare statunitense \*1 (senza garanzia) Vi invito a unirvi al gruppo familiare Avete bisogno del vostro account statunitense Invito al gruppo familiare" } <|im_end|> | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 重新抽奖吧 gemini美区家庭组车位/年*1' in en |
严重
[准确性]
"Translation of 'Title: 重新抽奖吧 gemini美区家庭组车位/年*1' in en"
理由: Hypothesis is not a translation but a placeholder text indicating translation request | The hypothesis is a meta-description of the task, not the actual translation of the source text regarding a giveaway. | Placeholder; does not provide the explicit translation about restarting the draw for one Gemini US family plan seat per year. |
| tencent/HY-MT1.5-7B | Re-draw lots again! Gemini US family group spot/year *1 | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Raising Dragons and Crabs, OPC, Do you really have the potential to be an entrepreneur? |
致命
[准确性]
"Raising Dragons and Crabs, OPC"
理由: The source '养龙虾' is slang for running OpenClaw. The hypothesis mistranslates it as 'Raising Dragons and Crabs', introducing hallucinations ('Dragons') and wrong animals ('Crabs'), completely missing the technical context. | 源文“养龙虾, OPC”比喻运营 OpenClaw/OPC,将其译为“Raising Dragons and Crabs”混入“crabs”“dragons”完全改变域内隐喻,严重误导。 | Should be 'Running OpenClaw'. '龙虾' is slang for OpenClaw, not literal dragons and crabs. |
| CohereLabs/tiny-aya-water | Title: Raising Dragons and Crabs, OPC, Do you really have the potential to be a boss? |
严重
[准确性]
"Raising Dragons and Crabs"
理由: Severe mistranslation of slang. '养龙虾' means running OpenClaw. 'Dragons' is completely hallucinated (maybe confused 'long' in some other context? but source is 'longxia' = lobster). This makes the title nonsensical. | The translation uses slang ('Raising Dragons and Crabs') instead of the proper term 'Running OpenClaw'. | "养龙虾" is slang for running/operating OpenClaw; introducing "dragons and crabs" distorts the intended term and adds an extra concept not in the source. | Should be 'Running OpenClaw'. '养龙虾' is a metaphor for running/deploying OpenClaw, not literal dragons and crabs |
| Qwen/Qwen2.5-14B-Instruct | Title: Raising Lobsters, OPC, Do you really have the potential to be a boss?, Content: Originally planned to write for the official account, but let's start with a simplified version here first: 1. When starting a company, the primary concern is making money. Do you have a complete plan for earning money? |
严重
[准确性]
"Raising Lobsters, OPC"
理由: Misses the slang. '养龙虾' refers to running OpenClaw, not literal lobster raising. Should be 'Running OpenClaw, OPC' | Here "养龙虾" refers to running OpenClaw; translating literally as "Raising Lobsters" obscures the core software/business meaning. | Failed to translate slang '养龙虾' (Running OpenClaw/OPC). |
| Qwen/Qwen2.5-7B-Instruct | Title: Raising Lobsters,OPC, Do you really have the potential to be a boss? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Allevare gamberi, OPC, Hai realmente il potenziale per diventare un imprenditore? |
严重
[准确性]
"Allevare gamberi, OPC"
理由: The hypothesis only translates the title and omits the content about writing a WeChat article, starting a company, and monetization strategies. | ‘养龙虾’ refers to running/deploying OpenClaw, not raising shrimp; treating it literally plus separating from OPC loses the software/business nuance. | Translation is in Italian instead of Chinese. Target language is zh (Chinese), not Italian. |
| Qwen/Qwen3.5-4B | Title: Raising Lobsters, OPC, Do you really have the potential to be a boss? Content: Originally planned to write for WeChat Official Account, but will publish this simplified version here first: 1. When starting a company, the first thing to consider is making money. Do you have a complete path to earning income? |
严重
[准确性]
"Raising Lobsters, OPC"
理由: “养龙虾” is about running OpenClaw; literal ‘Raising Lobsters’ distorts the business/tech metaphor. | Failed to translate the slang '养龙虾' (raising lobsters) which refers to 'Running/Deploying OpenClaw'. The literal translation makes the title about agriculture. | Should be 'Running OpenClaw, OPC'; 'raising lobsters' is incorrect metaphor translation |
| Qwen/Qwen3.5-9B | Title: Raising Lobsters, OPC, Do you really have the potential to be a boss? Content: Originally planned to write for WeChat Official Account, but will send this simplified version here first: 1. Starting a company, the first thing to consider is making money. Do you have a complete way to make money? |
严重
[准确性]
"Raising Lobsters, OPC"
理由: Literal translation of slang '养龙虾'. Should be 'Running OpenClaw' or 'Deploying OpenClaw'. | "养龙虾" is slang for running/deploying OpenClaw; "Raising Lobsters" suggests literal animals and loses the technical/business sense. | Should be 'Running OpenClaw, OPC' not 'Raising Lobsters, OPC'; misses slang meaning |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Crabs Farming, OPC, Do you have the potential to be a boss? |
严重
[准确性]
"Crabs Farming, OPC"
理由: ‘养龙虾’ here refers to running OpenClaw, not ‘Crabs Farming’; introduces wrong animal and loses OpenClaw metaphor. | The hypothesis translates '龙虾' as 'Crabs' but should be 'OpenClaw'. This is a consistent mistranslation of the slang term used throughout the forum. | The hypothesis only translates the title and omits the content text ('本来打算写公众号的...').
严重
[术语]
"Crabs Farming"
理由: Incorrect translation - '养龙虾' here is slang for deploying OpenClaw (software), not literal crab farming. Should be 'Running OpenClaw' or 'OpenClaw deployment' | Mistranslation of '养龙虾'. It refers to deploying/running 'OpenClaw', not farming crabs. | The term '养龙虾' should be translated as 'Running OpenClaw'. |
| google/gemma-3-12b-it | Title: Raising Lobsters, OPC, Do you really have the potential to be a boss? |
致命
[准确性]
"Raising Lobsters, OPC"
理由: Literal translation of slang '养龙虾' (Running OpenClaw/OPC). Completely changes the meaning to agriculture. | "养龙虾" here again refers to running OpenClaw; literal "Raising Lobsters" misrepresents the core topic compared to "Running OpenClaw" in reference. | Misses slang meaning. '养龙虾' refers to running/deploying OpenClaw, not raising lobsters. Should be 'Running OpenClaw'. |
| google/gemma-3-1b-it | Title: 养龙虾,OPC, 你真的有当老板的潜质么? |
致命
[准确性]
"Title: 养龙虾,OPC, 你真的有当老板的潜质么?"
理由: This simply repeats the Chinese title and does not translate it, failing to convey meaning in the target language. | Not translated at all - hypothesis is identical to source. Should be 'Running OpenClaw, OPC... do you really have the potential to be a boss?' Missing content section. | The model failed to translate the text, outputting the original Chinese source instead. |
| google/gemma-3-4b-it | Lobster soup |
致命
[准确性]
"Lobster soup"
理由: Completely wrong translation - reference is about running OpenClaw/OPC for business, not lobster soup | Completely unrelated to source; omits all key concepts (OpenClaw, OPC, being a boss/entrepreneur). | Should be 'Running OpenClaw, OPC... do you really have the potential to be a boss?' instead of 'Lobster soup'. | Complete hallucination/mistranslation. The source title is about 'Raising Lobsters (OpenClaw)' and 'OPC', not soup. The hypothesis bears no resemblance to the source meaning. | Completely incorrect translation; should be 'Running OpenClaw, OPC... do you really have the potential to be a boss?' with full content |
| google/translategemma-12b-it | Title: Lobster Farming, OPC - Do you really have what it takes to be a boss? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Raising Lobsters, OPC, Do you really have the potential to be a boss? |
严重
[准确性]
"Raising Lobsters"
理由: The translation incorrectly translates 'OpenClaw' as 'Lobsters'. | Mistranslation - '养龙虾' is slang for 'running OpenClaw', not literal 'raising lobsters'. Should be 'Running OpenClaw, OPC... do you really have the potential to be a boss?' | "养龙虾" here is slang for running/deploying OpenClaw; literal "Raising Lobsters" misrepresents the topic as about animals instead of a tool/business. | Literal translation of the slang '养龙虾'. It should be 'Running OpenClaw' or 'Deploying OpenClaw'. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '养龙虾,OPC, 你真的有当老板的潜质么?' in en", "zh-TW": "Translation of '養龍蝦,OPC, 你真的有當老闆的潛力麼?' in zh-TW", "ja": "Translation of '養龍蝦,OPC, 你真的有當老闆的潛力麼?' in ja", "ko": "Translation of '養龍蝦,OPC, 你真的有當老闆的潛力麼?' in ko", "de": "Translation of 'Pflanzen-Luftkrebs,OPC, Hast du wirklich die Fähigkeit, ein Unternehmen zu leiten?', in de", "fr": "Translation of 'Pflanzen-Luftkrebs,OPC, As-tu vraiment la capacité de diriger une entreprise?', in fr", "it": "Translation of 'Pflanzen-Luftkrebs,OPC, Hai davvero la capacità di gestire un'azienda?", in it } |
严重
[准确性]
""en": "Translation of '养龙虾,OPC, 你真的有当老板的潜质么?' in en""
理由: Hypothesis is not a translation but a placeholder text with multiple language tags; target language is zh but hypothesis contains multiple language codes | English field is placeholder and does not translate the idea of 'Running OpenClaw, OPC... do you really have the potential to be a boss?'. | The hypothesis is a malformed JSON-like structure containing meta-descriptions for multiple languages instead of a single English translation. It fails to provide the actual translation. |
| tencent/HY-MT1.5-7B | "en": "Title: Raising Lobsters, OPC – Do You Really Have the Potential to Be a Boss?", "zh-TW": "标题:養龍蝦,OPC – 你真的有當老闆的潛質嗎?", "ja": "タイトル:ロブスターの飼育、OPC – 本当に経営者になる潜在力がありますか?", "ko": "제목: 로브스터 양식, OPC – 정말로 사장이 될 잠재력이 있나요?", "de":="Titel: Lobsterzucht, OPC – Haben Sie wirklich das Potenzial, ein Chef zu werden?", "fr":="Titre: Élevage de homards, OPC – Avez-vous vraiment le potentiel de devenir patron ?", "it":="Titolo: Allevamento di aragoste, OPC – Avete davvero il potenziale per diventare imprenditori?"} |
严重
[准确性]
"Raising Lobsters"
理由: Source uses '养龙虾' (raising lobsters) as slang for running OpenClaw, but hypothesis translates it literally as 'raising lobsters' instead of 'running OpenClaw' | "养龙虾" is slang for running/deploying OpenClaw, not literal lobster raising; the business/tech metaphor is lost. | Literal translation of slang '养龙虾'. Should be 'Running OpenClaw' or 'Deploying OpenClaw'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Register now and enjoy the benefits of our free API service! |
致命
[准确性]
"Register now and enjoy the benefits of our free API service!"
理由: Hypothesis is completely generic and doesn't match source content at all. Should be a title translation of the source. | 完全意译成推广语,丢失“大可乐公益站开放注册”“仅有gpt-5.2、5.3-codex、5.4”“注册已结束”等所有关键信息,与原文事实不符。 | The hypothesis is a complete hallucination/marketing rewrite. The source states 'Registration has ended' (注册已结束), while the hypothesis says 'Register now'. It also omits the specific model versions and credit donation details. |
| CohereLabs/tiny-aya-water | Title: Big Koola Public Station Registration Opened - Currently Only gpt-5.2, 5.3-Codex, and 5.4 Are Registered - Registration Ended |
轻微
[准确性]
"Big Koola"
理由: Slight misspelling of "Big Cola" (可乐), but referent remains mostly clear. | Should be 'Big Cola'. '大可乐' translates to 'Big Cola', not 'Big Koola' | The translation uses 'Koola' instead of 'Cola', which is less precise. | Mistranslation of '大可乐' (Big Cola, a nickname). 'Koola' is odd. More importantly, 'Currently Only... Are Registered' misinterprets '目前仅有...'. It means the station currently *offers* these models, not that these specific versions are the only ones who registered. Also 'Registration Ended' contradicts 'Registration Opened' in the same sentence due to poor parsing of the source structure (Title says Opened, then lists models, then notes registration ended? Or implies the specific 5.4 reg ended? The reference handles this better). The hypothesis phrasing is confusing. |
| Qwen/Qwen2.5-14B-Instruct | Title: Big Cola Free Shared Site Open Registration - Currently Only gpt-5.2, 5.3-codex, and 5.4 Available; Registration Has Ended Content: Registration URL https://bigkele.us.ci Donation Requested LINUX DO Credit Recharge Supported |
轻微
[术语]
"Free Shared Site"
理由: Reference uses 'free API proxy site' - 'Free Shared Site' is not standard terminology | Should be 'free API proxy site' for consistency and clarity | Generic translation for '公益站'. 'Free API proxy site' is more precise in this context. |
| Qwen/Qwen2.5-7B-Instruct | Title: BigKele Public Proxy Open Registration - Currently Only gpt-5.2, 5.3-Codex, 5.4 Registered Users, Registration Ended Content: Registration Address https://bigkele.us.ci Donate with LINUX DO Credit, Now Supports Points Recharge |
严重
[准确性]
"Registered Users, Registration Ended"
理由: Misinterpretation of the source structure. The source lists available models (5.2, 5.3, 5.4) and then states registration has ended. The hypothesis incorrectly implies that only users of version 5.4 are registered or that 5.4 registration specifically ended in a disjointed way. | Confusing translation. Should be 'Currently only has gpt-5.2, 5.3-codex, 5.4 - Registration has ended'. | Misreads structure; should indicate that the site currently only offers those models and that registration has ended, not that only registered users exist. |
| Qwen/Qwen3-14B | Title: Bigkele Public Proxy Registration Open - Currently Only gpt-5.2, 5.3-codex, 5.4 - Registration Closed Content: Registration address https://bigkele.us.ci Request donations LINUX DO Credit has supported points recharge | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: BigKele Free Shared API Site Registration Open - Currently Only gpt-5.2, 5.3-codex, 5.4 Available; Registration Closed Content: Registration Address https://bigkele.us.ci Request Donations LINUX DO Credit Now Supports Point Recharge | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Big Kele Free Shared API Site Registration Open - Currently Only gpt-5.2, 5.3-codex, 5.4 Registered; Registration Closed Content: Registration URL https://bigkele.us.ci Please tip LINUX DO Credit now supports point recharge |
轻微
[术语]
"Big Kele"
理由: "大可乐" is more idiomatically "Big Cola"; "Big Kele" is a transliteration and less natural but understandable. | 'Kele' is Pinyin for Cola. 'Big Cola' is the standard English translation for '大可乐'. | Should be 'Big Cola free API proxy site' not 'Big Kele Free Shared API Site'
严重
[流畅性]
"Please tip LINUX DO Credit now supports point recharge"
理由: Missing conjunction or punctuation between two clauses; slightly unnatural but meaning clear. | Run-on sentence and broken syntax. Should be two distinct thoughts: asking for tips/donations and stating that point recharge is supported. | Should be 'Looking for LINUX DO Credit donations. Point top-ups are now supported' for clarity |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Big Cola Public Proxy Open Registration - Currently only GPT-5.2, 5.3-CODEX, 5.4 registration has ended |
轻微
[术语]
"Public Proxy"
理由: Should be 'free API proxy site' or 'free proxy' to match community terminology | ‘公益站’ here specifically means ‘free API proxy site’; ‘Public Proxy’ is vague and misses the ‘API proxy’ aspect. | The hypothesis uses 'Public Proxy' but should be 'free API proxy site' to be more precise and consistent with reference terminology. |
| google/gemma-3-12b-it | Bigcola Public Proxy Site Open Registration - Currently only gpt-5.2, 5.3-codex, 5.4 - Registration has ended | 通过 (无共识错误) |
| google/gemma-3-1b-it | Registration address: https://bigkele.us.ci |
严重
[准确性]
"Registration address: https://bigkele.us.ci"
理由: Only translates the registration URL segment and omits main information that this is Big Cola’s free proxy site, models available, and that registration has ended and donations are sought. | Incomplete translation - only translates one line. Missing title translation and other content about the API proxy site details. | Severe under-translation. The hypothesis only provides the URL, completely omitting the title information about the site status, supported models, and the content about donations/credits. |
| google/gemma-3-4b-it | Title: BigKele Public Proxy Open Registration - Currently Only gpt-5.2, 5.3-codex, 5.4-Registration is closed | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Big Kele Public Proxy Site Now Open for Registration - Currently Only gpt-5.2, 5.3-codex, 5.4 - Registration is Closed Content: Registration Address: https://bigkele.us.ci Please Donate LINUX DO Credit Supported for Point Recharge |
轻微
[术语]
"Public Proxy Site"
理由: The hypothesis uses 'Public Proxy Site' while the reference uses 'free API proxy site', which is more specific and accurate terminology. | Slightly imprecise: 'Public Proxy Site' should be 'free API proxy site' to match the source meaning of 公益站 (free/public welfare site) | Source "公益站" in this community specifically means free API proxy site; "Public Proxy Site" is understandable but less precise. |
| google/translategemma-4b-it | Title: Bigele Free Shared API Site Open Registration - Currently Only gpt-5.2, 5.3- codex, 5.4- Registration is closed |
严重
[准确性]
"Bigele"
理由: Mistranslation - '大可乐' should be 'Big Cola' not 'Bigele'. Also the title structure is awkward with '5.3- codex' - should be 'gpt-5.2, 5.3-codex, 5.4'. | '大可乐' literally means 'Big Cola'. 'Bigele' appears to be a hallucinated transliteration. 'Big Cola' or keeping the Pinyin 'Dakele' would be better, though 'Big Cola' captures the meaning. | Mistranscribes the name "大可乐" which corresponds to "Big Cola"; "Bigele" obscures the intended name. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '大可乐公益站开放注册-目前仅有gpt-5.2、5.3-codex、5.4-注册已结束' in en", "zh-TW": "Translation of '大可乐公益站开放注册-目前仅有gpt-5.2、5.3-codex、5.4-注册已结束' in zh-TW", "ja": "Translation of '大可乐公益站开放注册-目前仅有gpt-5.2、5.3-codex、5.4-注册已结束' in ja", "ko": "Translation of '大可乐公益站开放注册-目前仅有gpt-5.2、5.3-codex、5.4-注册已结束' in ko", "de": "Translation of '大可乐公益站开放注册-目前仅有gpt-5.2、5.3-codex、5.4-注册已结束' in de", "fr": "Translation of '大可乐公益站开放注册-目前仅有gpt-5.2、5.3-codex、5.4-注册已结束' in fr", "it": "Translation of '大可乐公益站开放注册-目前仅有gpt-5.2、5.3-codex、5.4-注册已结束' in it} |
严重
[准确性]
""en": "Translation of '大可乐公益站开放注册-目前仅有gpt-5.2、5.3-codex、5.4-注册已结束' in en""
理由: English field is only a placeholder and omits the actual meaning of Big Cola free API proxy site registration status and supported models. | The hypothesis is a malformed structure containing meta-descriptions for multiple languages instead of the actual English translation of the source text. | Hypothesis is not a translation but a placeholder text with multiple language tags; target language is zh but hypothesis contains multiple language codes |
| tencent/HY-MT1.5-7B | "en": "Title: Big Coke Public Welfare Site Open for Registration - Currently Only gpt-5.2, 5.3-codex, 5.4 Are Available for Registration. Registration Has Ended.", "zh-TW": "Title: 大可乐公益站开放注册 - 目前仅支持gpt-5.2、5.3-codex、5.4版本进行注册。注册已结束.", "ja": "Title: 大可乐公益站开放注册 - 现在仅可注册gpt-5.2、5.3-codex、5.4版本。注册已经结束。」, "ko": "Title: 대콜라 공익사이트 등록 개시 - 현재 gpt-5.2, 5.3-codex, 5.4 버전만 등록 가능합니다. 등록이 이미 종료되었습니다.", "de": "Title: Registrierung für die Big Coke Public Welfare Site ist möglich – derzeit nur gpt-5.2, 5.3-codex, 5.4 sind verfügbar. Die Registrierung ist beendet.", "fr": "Titre : La registration sur le site public d’entraide Big Coke est ouverte – pour l’instant, seules les versions gpt-5.2, 5.3-codex, 5.4 sont disponibles. La registration est terminée.", "it": "Titolo: La registrazione sul sito di beneficenza Big Coke è aperta – al momento sono disponibili solo le versioni gpt-5.2, 5.3-codex, 5.4. La registrazione è conclusa."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【CurryAPI】【Supported 5.4】Open Registration at Public Site【Closed】 Content: Limited and time-bound offer, up to 2000 users. First come, first served. URL: https://stephecurry.asia Check here for 100x1w exchange codes. |
严重
[准确性]
"Limited and time-bound offer, up to 2000 users."
理由: The source and reference indicate 'no time limit,' but the hypothesis incorrectly adds 'time-bound.' | 源文为“限量不限时【2000人为止】”,明确说明不限时间,仅限名额;译文增加“time-bound”造成反向含义。 | Should be 'Limited quantity, no time limit' not 'Limited and time-bound offer'. The source says '限量不限时'. | The source says '限量不限时' (Limited quantity, unlimited time). The hypothesis translates this as 'time-bound offer', which is the exact opposite of the source meaning. |
| CohereLabs/tiny-aya-water | Title: 【CurryAPI】【Supported 5.4】Open Registration at Public Station【Closed】 Content: Limited and time-bound access [up to 2000 users], first come, first served. URL: https://stephe curry . asia Check here 100x1w credit exchange codes |
轻微
[准确性]
"Limited and time-bound access [up to 2000 users]"
理由: Source says '限量不限时' (Limited quantity, unlimited time). Hypothesis says 'time-bound access', which is the opposite of 'unlimited time'. | Incorrect: source says '不限时' (no time limit), not 'time-bound access' | Original explicitly states limited quantity with no time limit; "time-bound" contradicts "不限时".
严重
[准确性]
"https://stephe curry . asia"
理由: Incorrectly inserted spaces into the URL, breaking it. | URL has incorrect spacing. Should be 'https://stephecurry.asia' without spaces | Incorrect spacing: should be 'stephecurry.asia' without spaces |
| Qwen/Qwen2.5-14B-Instruct | Title: [CurryAPI] [Supporting Version 5.4] Free Shared Site Registration Open [Closed] Content: Limited but No Time Restriction [Up to 2000 People], First Come, First Served URL: https://stephecurry.asia See Here 100 Codes for 10k Quota LINUX DO CDK |
轻微
[流畅性]
"Limited but No Time Restriction"
理由: Should be "Limited quantity but no time limit" to sound natural; current phrasing is slightly awkward. | Awkward phrasing for '限量不限时'. | Reference says 'Limited quantity, no time limit' - slightly different phrasing
轻微
[流畅性]
"See Here 100 Codes for 10k Quota LINUX DO CDK"
理由: Awkward. Should be '100 redemption codes for 10k quota via LINUX DO CDK' | Reference says '100 redemption codes for 10k quota' - more precise | Needs punctuation/connecting words; as-is it's a compressed, telegraphic phrase.
轻微
[术语]
"Free Shared Site Registration Open"
理由: Should be 'Free API proxy site is open for registration' | Generic translation for '公益站'. | Reference uses 'Free API proxy site' - inconsistent terminology |
| Qwen/Qwen2.5-7B-Instruct | Title: [CurryAPI][Supported 5.4] Free Shared API Site Open Registration [Closed] Content: Limited to 2000 people, first come, first served URL: https://stephecurry.asia Check here 100x10k quota redemption code LINUX DO CDK | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: [CurryAPI][Supporting 5.4] Free Shared API Site Open for Registration [Closed] Content: Limited quantity, no time limit [up to 2000 people], first come first served URL: https://stephecurry.asia See here 100 10,000 quota redemption codes LINUX DO CDK | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [CurryAPI][Supports 5.4] Free Shared API Site Registration Opened [Closed] Content: Limited quantity, unlimited time [Up to 2000 users], first come first served. URL: https://stephecurry.asia See here 100 codes worth 10k credits for LINUX DO CDK |
轻微
[流畅性]
"See here 100 codes worth 10k credits for LINUX DO CDK"
理由: The phrasing is a bit dense. '100 redemption codes for 10k quota via LINUX DO CDK' is clearer. | Missing comma and articles; slightly awkward order but meaning is clear. | Should be '100 redemption codes for 10k quota'; awkward phrasing |
| Qwen/Qwen3.5-9B | Title: [CurryAPI][Supports 5.4] Free Shared API Site Registration Open [Closed] Content: Limited quantity, no time limit [Up to 2000 people], first come first served. URL: https://stephecurry.asia See here 100 codes worth $10k for Linux DO CDK |
严重
[准确性]
"100 codes worth $10k"
理由: "100个1w额度兑换码" means 100 redemption codes each with 10k quota, not monetary value "$10k". | Should be '100 redemption codes for 10k quota via LINUX DO CDK' not 'worth $10k' | Misinterpretation. '1w 额度' means '10,000 quota/credits', not '$10k' (currency). Also, 'Linux DO CDK' is the source of the codes, not the recipient. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [CurryAPI][Supported 5.4] Free Shared Site Registration [Closed] |
轻微
[术语]
"Free Shared Site"
理由: The hypothesis uses 'Free Shared Site' but should be 'Free API proxy site' for consistency and accuracy. | Should be 'free API proxy site' to match community terminology | ‘公益站’ is better rendered as ‘free API proxy site’; ‘Shared Site’ under-specifies the API/proxy nature. |
| google/gemma-3-12b-it | Title: [CurryAPI][Supports 5.4] Public Proxy Site Open for Registration [Closed] Content: Limited quantity, unlimited time [until 2000 people], first come, first served. Website: https://stephecurry.asia See here. 100 10,000 credit redemption codes LINUX DO CDK | 通过 (无共识错误) |
| google/gemma-3-1b-it | Limited spots available, first come first served, website: https://stephecurry.asia |
严重
[准确性]
"Limited spots available, first come first served, website: https://stephecurry.asia"
理由: Incomplete translation - missing title and other content details about the CurryAPI site and redemption codes. | Severe under-translation. Misses the title information (CurryAPI, 5.4 support, closed status) and content details (2000 people limit, CDK codes). | Captures part of the meaning but omits that it is a free API proxy site, now supporting 5.4, and misses mention of registration/closed status and quota redemption codes. |
| google/gemma-3-4b-it | Title: [CurryAPI][Supported 5.4] Community Site Open Registration [Closed] Content: Limited quantity and time [Up to 2000 people], first come, first served URL: https://stephecurry.asia See here 100 1w quota exchange code LINUX DO CDK |
轻微
[准确性]
"Limited quantity and time [Up to 2000 people]"
理由: Source says limited quantity with no time limit; hypothesis adds “and time”, which contradicts original meaning. | Source says '限量不限时' (Limited quantity, unlimited time). Hypothesis says 'Limited quantity and time', which contradicts the source. | Should be 'Limited quantity, no time limit [Up to 2000 people]' not 'and time'
轻微
[流畅性]
"100 1w quota exchange code"
理由: Reference says 'redemption codes' - minor plural form difference | Expression is awkward; should be “100 redemption codes for 10k quota” or similar; “1w” is unexplained slang. | Should be '100 redemption codes for 10k quota' for better clarity | '1w' (10,000) is Chinese internet slang. Should be expanded to '10k' for English readers. |
| google/translategemma-12b-it | Title: [CurryAPI] [Supports Version 5.4] Public API Site Open for Registration [Closed] Content: Limited quantity, unlimited time [until 2000 people], first come, first served. Website: https://stephecurry.asia See here for 100 redemption codes with a $10,000 credit limit: LINUX DO CDK |
轻微
[准确性]
"$10,000 credit limit"
理由: The source says '1w 额度' (10k quota/credits). Adding the '$' sign assumes currency which might not be accurate; '10k quota' (reference) is safer. | The hypothesis says '$10,000' when the reference says '10k quota', which is clearer. Also, the structure should be '100 redemption codes for 10k quota via LINUX DO CDK' not 'with a $10,000 credit limit'. | "1w额度" is 10,000 units of quota, not necessarily a monetary $10,000 limit; adding currency and value is incorrect. | Incorrect number: '10k' in reference should be '1w' (10,000) but hypothesis says '$10,000' adding a dollar sign that wasn't in the source |
| google/translategemma-4b-it | Title: 【CurryAPI】【Supported up to 5.4】Public API Site Open Registration【Closed】 Content: Limited and temporary【Up to 2000 people】, First come, first served URL: https://stephecurry.asia See here 100 1w quota exchange codes LINUX DO CDK | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '公益站开放注册【已关闭]' in en |
严重
[准确性]
"Translation of '公益站开放注册【已关闭]' in en"
理由: Hypothesis is not a translation but a placeholder text indicating translation request; also incomplete source text | Placeholder; does not explicitly render 'Free API proxy site is open for registration [Closed]'. | The hypothesis is a meta-description string, not the actual translation. It also only covers a fragment of the title, ignoring the 'CurryAPI' and version info. |
| tencent/HY-MT1.5-7B | "en": "Title: [CurryAPI] [Supported since version 5.4] Registration open for public welfare sites [Closed now]", "zh-TW": "Title: [CurryAPI] [Supported since version 5.4] Registration open for public welfare sites [Closed now]", "ja": "Title: [CurryAPI] [Version 5.4以降でサポートされている] 公共 Welfare サイトの登録が開始されました [現在は閉鎖中]", "ko": "Title: [CurryAPI] [버전 5.4 이상에서 지원됨] 공공 Welfare 사이트의 등록이 시작되었습니다 [현재는 폐쇄 중]", "de":="Title: [CurryAPI] [Ab Version 5.4 unterstützt] Registrierung für öffentliche Welfare-Sites ist jetzt möglich [Derzeit geschlossen]", "fr": "Title: [CurryAPI] [Déjà supporté depuis la version 5.4] Inscription ouverte pour les sites de bienfice public [Actuellement fermée]", "it": "Title: [CurryAPI] [Disponibile a partire dalla versione 5.4] Iscrizione aperta per i siti di pubblica utilità [Attualmente chiusa]"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【CurryAPI】Public Site【Maybe on sonnet4.5】 Content: 【CurryAPI】[Has supported 5.4] Public Station registration open [Fiber wool benefits] Limited time and quantity [Up to 2000], first come first served. Check here to register. Registration has been changed to level 0. |
严重
[准确性]
"[Fiber wool benefits]"
理由: '羊毛' is slang for 'deals', 'perks', or 'freebies' (literally 'wool', as in 'pulling wool' from a deal). Translating it as 'Fiber wool' is a nonsensical literal translation that loses the meaning. | Should be 'Freebies/perks' not 'Fiber wool benefits'. '福利羊毛' means freebies/perks, not fiber wool. | 将网络用语“羊毛”(薅羊毛、福利)误译为“Fiber wool”,完全失去“perks/freebies”含义。
严重
[准确性]
"Limited time and quantity"
理由: 原文“限量不限时”表示数量有限、不限时间;译文加上“time”与“限时”含义相反。 | The source and reference specify 'no time limit,' but the hypothesis incorrectly includes 'time-bound.' | Source says '限量不限时' (Limited quantity, unlimited time). Hypothesis says 'Limited time and quantity', which contradicts the source regarding the time limit. |
| CohereLabs/tiny-aya-water | Title: 【CurryAPI】Public Site【may be updated to sonnet4.5】 |
严重
[准确性]
"Title: 【CurryAPI】Public Site【may be updated to sonnet4.5】"
理由: Omits the source content entirely. 'Public Site' is a weak translation for '公益站' (Free API proxy site / Public welfare site in this context). | The translation omits 'free API proxy' in the reference. | Incorrect: '可能会上sonnet4.5' means 'might add sonnet 4.5', not 'may be updated to sonnet4.5' |
| Qwen/Qwen2.5-14B-Instruct | Title: [CurryAPI] Free Shared API Site [May Support Sonnet 4.5] Content: [CurryAPI][Supporting 5.4] Community Registration Open for Free Shared API Site - Limited but Unlimited Time Offers Until 2000 Users, First Come First Served See Here Registration Now Requires Level 0 |
轻微
[流畅性]
"Community Registration Open for Free Shared API Site - Limited but Unlimited Time Offers Until 2000 Users"
理由: Confusing and contradictory phrasing ('Unlimited Time Offers Until'). | Awkward phrasing. Should be 'Free API proxy site is open for registration. Freebies/perks, limited quantity but no time limit [Up to 2000 people]' | Reference says 'Free API proxy site is open for registration' - more concise | Phrase "Limited but Unlimited Time Offers" is awkward; clearer as "limited to 2000 users with no time limit". |
| Qwen/Qwen2.5-7B-Instruct | Title: [CurryAPI] Free Shared Site【May Support sonnet4.5】 Content: [CurryAPI] Now supports version 5.4. The free shared site is open for registration. Limited offer for level 0 registration available until 2000 people, first come, first served. Check here. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | ...} | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [CurryAPI] Free Shared API Site [May be available on Sonnet 4.5] Content: [CurryAPI][Supports 5.4] Free Shared API Site Registration Open - Free Benefits & Unlimited Access (Limited to 2000 users), First come first served. See here. Registration now requires Level 0. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [CurryAPI] Free Shared API Site [May Support Sonnet 4.5] Content: [CurryAPI][Now Supports 5.4] Public Proxy Registration Open - Free Benefits and Deals, Limited Quantity but Unlimited Time [Up to 2000 Users], First Come First Served. See Here. Registration Now Requires Level 0 to Register. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [CurryAPI] Free Shared Site [May Support sonnet4.5] |
轻微
[术语]
"Free Shared Site"
理由: Should be 'Free API proxy site' instead of 'Free Shared Site' for consistency and accuracy. | As above, ‘公益站’ is more specifically a ‘free API proxy site’; ‘Shared Site’ is vague and nonstandard for this context. | Should be 'free API proxy site' to match community terminology |
| google/gemma-3-12b-it | Title: [CurryAPI] Public Proxy [May be on sonnet4.5] Content: [CurryAPI] [Supports 5.4] Public Proxy open registration Welfare wool Limited time and quantity [Up to 2000 people], first come, first served See here Registration has been changed to level 0 can register | 通过 (无共识错误) |
| google/gemma-3-1b-it | CurryAPI - A free shared API site with potential to be on sonnet4.5 |
严重
[准确性]
"CurryAPI - A free shared API site with potential to be on sonnet4.5"
理由: Mostly accurate but slightly vague; “可能会上sonnet4.5” is captured, but omits mention that it already supports 5.4 and is open with limited registration slots and reduced level requirement. | Severe under-translation. Only attempts a partial title translation, missing all content details about registration requirements, limits, and current support (5.4). | Incomplete translation - missing content section about registration details and level requirements. |
| google/gemma-3-4b-it | Title: [CurryAPI] Free Shared API Site [May Appear on sonnet4.5] |
轻微
[准确性]
"[CurryAPI] Free Shared API Site [May Appear on sonnet4.5]"
理由: Reference says 'Might add sonnet 4.5' - '可能会上' means might add/appear, acceptable translation | Should be 'Free API proxy site' not 'Free Shared API Site' | The term '公益站' in this context refers to a free service provided by an individual or group (often a proxy), not necessarily a 'shared' site. 'Free API proxy site' or 'Free public station' is more accurate. |
| google/translategemma-12b-it | Title: [CurryAPI] Public API Site [May use sonnet4.5] Content: [CurryAPI] [Supports 5.4] Public API site is now open for registration. Limited-time and quantity benefits (up to 2000 people), first come, first served. See here. Registration has been changed to allow registration at level 0. |
轻微
[准确性]
"Limited-time and quantity benefits (up to 2000 people)"
理由: The hypothesis says 'Limited-time and quantity benefits' when the reference correctly states 'limited quantity but no time limit', which are opposite meanings. | Source "限量不限时【2000人为止】" says limited quantity with no time limit; adding "Limited-time" contradicts the original. | The source says '限量不限时' (limited quantity, unlimited time). The hypothesis says 'Limited-time and quantity', which contradicts the 'unlimited time' part. |
| google/translategemma-4b-it | Title: 【CurryAPI】Free Shared API Site【May be on sonnet4.5】 Content: 【CurryAPI】【Supports 5.4】Free Shared API Site open registration Welfare wool Limited quantity and time【Up to 2000 people】, First come first served See here Registration has been changed to 0 level registration |
轻微
[流畅性]
"Registration has been changed to 0 level registration"
理由: Poor translation - 'Welfare wool' is awkward phrasing for '福利羊毛' (freebies/perks). Also 'Limited quantity and time' should be 'limited quantity but no time limit'. Needs better punctuation and flow. | Unnatural repetition; better as "The requirement has been lowered so level 0 users can register." | Awkward phrasing; 'level 0 registration' or 'registration open to level 0' is more natural in English. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of 'CurryAPI】公益站【可能会上sonnet4.5\' in en", "zh-TW": "Translation of 'CurryAPI】【已支持5.4】公益站开放注册 福利羊毛 限量不限时【2000人为止】,先到先得 看这里 注册已改为0级即可注册' in zh-TW", "ja": "Translation of 'CurryAPI】【已支持5.4】公益站开放注册 福利羊毛 限量不限时【2000人为止】,先到先得 看这里 注册已改为0级即可注册' in ja", "ko": "Translation of 'CurryAPI】【已支持5.4】公益站开放注册 福利羊毛 限量不限时【2000人为止】,先到先得 看这里 注册已改为0级即可注册' in ko", "de": "Translation of 'CurryAPI】【已支持5.4】公益站开放注册 福利羊毛 限量不限时【2000人为止】,先到先得 看这里 注册已改为0级即可注册' in de", "fr": "Translation of 'CurryAPI】【已支持5.4】公益站开放注册 福利羊毛 限量不限时【2000人为止】,先到先得 看这里 注册已改为0级即可注册' in fr", "it": "Translation of 'CurryAPI】【已支持5.4】公益站开放注册 福利羊毛 限量不限时【2000人为止】,先到先得 看这里 注册已改为0级即可注册' in it} | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Title: [CurryAPI] Public Welfare Site [May Support sonnet4.5] | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【NPC公益站】Invitation Code, Official Link Content: NPCAPI Invitation Code NPCAPI Farm Reservation Official leader added invitation code, everyone can view in real time, and later invitation codes will also be placed here | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: 【NPC公益站】Invitation Code, Official Link Content: NPCAPI Invitation Code NPCAPI Farm Reservation Official leader added invitation code, everyone can view in real time, and later invite codes will be put here as well | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [NPC Free Shared Site] Invitation Code, Official Link Content: NPCAPI Invitation Code NPCAPI Farm Booking The official has added an invitation code, everyone can check it in real time and we will also keep subsequent invitation codes here |
轻微
[术语]
"Free Shared Site"
理由: Should be 'Free API Proxy Site' for consistency | Generic translation for '公益站'. | Reference uses 'Free API proxy site' - inconsistent terminology |
| Qwen/Qwen2.5-7B-Instruct | Title: [NPC Public Proxy] Invitation Code, Official Link Content: NPCAPI Invitation Code NPCAPI Farm Reservation The official has added invitation codes, everyone can check them in real-time, and we will also put future invitation codes here. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: [NPC Free Shared API Site] Invitation Code, Official Link Content: NPCAPI Invitation Code NPCAPI Farm Reservation The official team has added invitation codes, which everyone can view in real-time, and future invitation codes will also be placed here. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [NPC Free Shared API Site] Invite Code, Official Link Content: NPCAPI Invite Code NPCAPI Farm Reservation The official team has added an invite code, everyone can check it in real-time, and future invite codes will also be placed here. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [NPC Public Proxy] Invitation Code, Official Link Content: NPCAPI Invitation Code NPCAPI Farm Reservation The official senior members have added invitation codes, everyone can check them in real-time, and subsequent invitation codes will also be placed here. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [NPC Public Proxy] Invitation Code, Official Link Content: NPCAPI Invitation Code, Farm Reservation Official Big Guys have added invitation codes. You can view them in real-time here and also add subsequent invitation codes. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [NPC Public Proxy] Invitation Code, Official Link Content: NPCAPI invitation code, NPCAPI farm reservation. Official experts have added invitation codes, so everyone can view them in real time and add future invitation codes here. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '【NPC公益站】邀请码,官方链接' in en |
致命
[准确性]
"Translation of '【NPC公益站】邀请码,官方链接' in en"
理由: The model outputted a meta-statement instead of performing the translation. | The translation is incomplete and does not provide the actual content of the source text. | Not a translation - this is a meta-description. Should be '[NPC Free API Proxy Site] Invitation codes, official links'. Missing content section. | Translation is a generic placeholder 'Translation of...' instead of actual translation. Content section is completely omitted | Meta-text only; does not translate the content into English, so the meaning is not conveyed. |
| google/gemma-3-4b-it | Title: [NPC Free Shared API Site] Invitation Code, Official Link Content: NPCAPI invitation code NPCAPI farm reservation Official seniors have added invitation codes, everyone can view in real time, and subsequent invitation codes should also be placed here. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [NPC Public Proxy] Invitation Code, Official Link Content: NPCAPI invitation code. NPCAPI farm reservation. Official users have added invitation codes, so everyone can check them in real time, and please add any subsequent invitation codes here as well. |
轻微
[准确性]
"Official users have added invitation codes"
理由: "官方大佬" is more like "official admin/boss"; "users" slightly understates authority but meaning is mostly preserved. | The source '官方大佬' refers to the admin/boss ('official boss' in reference), not plural 'users'. | The hypothesis says 'Official users' when the reference says 'The official boss', which better captures the meaning of '官方大佬'. |
| google/translategemma-4b-it | Title: 【NPC Free Shared API Site】Invitation Code, Official Link Content: NPCAPI Invitation Code NPCAPI Farm Booking Official admins have added invitation codes, everyone can view in real time, and also put invitation codes here later. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Title: 【NPC公益站】邀请码,官方链接 Content: NPCAPI邀请码 NPCAPI农场预约 官方大佬增加了邀请码,大家可以实时查看,并且后续邀请码也放入这里吧 |
严重
[准确性]
"Title: 【NPC公益站】邀请码,官方链接
Content: NPCAPI邀请码 NPCAPI农场预约 官方大佬增加了邀请码,大家可以实时查看,并且后续邀请码也放入这里吧"
理由: Hypothesis is identical to source text, not a translation; target language is zh but source is already in zh | The hypothesis is identical to the source text (Chinese). It fails to translate the content into English as required. | This is just the original Chinese text copied as hypothesis without translation into the target language (English as per context). |
| tencent/HY-MT1.5-7B | Title: [NPC Public Welfare Site] Invitation Code, Official Link | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | As a programmer, I haven't set up a small shrimp server yet... |
致命
[准确性]
"a small shrimp server"
理由: '小龙虾' is slang for OpenClaw. Translating it as 'small shrimp server' is a literal mistranslation that loses the technical meaning entirely. | “小龙虾”是 OpenClaw 的昵称,这里应指部署 OpenClaw;译为“小虾服务器”导致完全跑题。 | Should be 'As a programmer, I haven't deployed OpenClaw yet'. '小龙虾' is slang for OpenClaw, not literal shrimp. |
| CohereLabs/tiny-aya-water | Title: As a programmer, I haven't set up a shrimp tower yet |
严重
[准确性]
"shrimp tower"
理由: The translation uses slang ('shrimp tower') instead of the proper term 'OpenClaw'. | Should be 'OpenClaw'. '小龙虾' is a metaphor for OpenClaw, not literal shrimp tower | "小龙虾" is slang for OpenClaw; "shrimp tower" is an incorrect literalization that obscures intended meaning of deploying OpenClaw. | Hallucination/Mistranslation. Source says '搭建过小龙虾' (deployed OpenClaw). 'Shrimp tower' is completely made up and incorrect. |
| Qwen/Qwen2.5-14B-Instruct | Title: As a programmer, I haven't set up a small lobster yet Content: Mainly because I have no idea how to set up a server... messing around by setting one up myself, then what can it be used for? Messing around with CPA at least using friends' provided codex number pool can save money on programming |
严重
[准确性]
"small lobster"
理由: "小龙虾" is OpenClaw here; literal "small lobster" hides the intended software/tool meaning. | Misses the slang. '小龙虾' refers to OpenClaw deployment, not literal lobster. Should be 'deployed OpenClaw' | Failed to translate slang '小龙虾' (OpenClaw).
严重
[准确性]
"save money on programming"
理由: Mistranslation of '白嫖' (freeload/get for free). 'Save money' implies paying less, not getting it free. | "白嫖" means getting for free/freeloading; "save money" softens the freeloading nuance but core idea of cost avoidance is kept. | Reference says 'code for free' (白嫖), not 'save money' - different meaning |
| Qwen/Qwen2.5-7B-Instruct | Title: As a Programmer, I Haven’t Set Up Little Shrimp Yet Content: Mainly because I have no idea how to set up a server... Tinkering with setting one up myself, then what can be used for? Tinkering with CPA at least gives users access to the code number pool provided by friends so they can get programming for free |
严重
[准确性]
"Little Shrimp"
理由: Mistranslation of '小龙虾'. Should be 'OpenClaw', not literal 'Little Shrimp'. | The translation is accurate but could be more precise. | Failed to recognize '小龙虾' as the slang for 'OpenClaw'. Literal translation renders the sentence meaningless for a programmer. | '小龙虾' is slang for OpenClaw, not literal shrimp. |
| Qwen/Qwen3-14B | Als Programmierer haben Sie noch nie einen Krebs aufgebaut? |
严重
[准确性]
"Als Programmierer haben Sie noch nie einen Krebs aufgebaut?"
理由: The term 'Krebs' (crabs) is a mistranslation of '龙虾' (lobsters). | Changes person from first to second and literal “Krebs” (crab) instead of OpenClaw deployment; meaning is incorrect. | Translation is in German instead of Chinese. Target language is zh (Chinese), not German. Also 'Krebs' (crab) is incorrect terminology. |
| Qwen/Qwen3.5-4B | As a programmer, I haven't even set up a crawfish farm. |
轻微
[准确性]
"As a programmer, I haven't even set up a crawfish farm."
理由: “搭建过小龙虾” is slang for deploying OpenClaw, not a literal crawfish farm; key technical meaning is lost and content is truncated, omitting the explanation about servers and using Codex pool. | Missing Content section - only Title was provided in hypothesis. | The hypothesis only translates the title and misses the entire content. Furthermore, it translates '小龙虾' literally as 'crawfish farm' instead of the slang 'OpenClaw'. | Incomplete translation; should include full title and content. 'crawfish farm' should be 'deployed OpenClaw' |
| Qwen/Qwen3.5-9B | Title: As a programmer who has never set up a crayfish Content: Mainly because I have no idea how to set up a server... fiddling around setting one up myself, then what can it be used for? Fiddling with CPA at least allows me to use the codex number pool provided by senior users to get programming for free. |
严重
[准确性]
"set up a crayfish"
理由: "搭建过小龙虾" is slang for deploying OpenClaw; literal "crayfish" is incorrect in this community context. | Should be 'deployed OpenClaw' not 'set up a crayfish'; misses slang meaning | Literal translation of slang '搭建过小龙虾'. Should be 'deployed OpenClaw'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | As a programmer, I haven't set up a small shrimp yet. |
严重
[准确性]
"small shrimp"
理由: The hypothesis only translates the title and omits the content text ('主要是完全不知道弄个服务器...'). | The hypothesis translates '小龙虾' as 'small shrimp' but should be 'OpenClaw'. This is a consistent mistranslation of the slang term. | ‘小龙虾’ in this community slang refers to OpenClaw, not a literal small shrimp; this loses correct referent.
严重
[术语]
"small shrimp"
理由: Mistranslation of '小龙虾'. Refers to 'OpenClaw', not literal shrimp. | Incorrect translation - '小龙虾' here refers to OpenClaw (software), not literal small shrimp. Should be 'OpenClaw' | The term '小龙虾' should be translated as 'OpenClaw'. |
| google/gemma-3-12b-it | Title: As a programmer, I haven't set up crawfish yet Content: Mainly because I have no idea how to set up a server... Trying to build one myself and then what can I use it for? At least when tinkering with CPA, you can freeload from the codex number pool provided by old friends. |
致命
[准确性]
"crawfish"
理由: Misses slang meaning. '小龙虾' refers to OpenClaw, not crawfish. Should be 'deployed OpenClaw'. | Literal translation of '小龙虾' (OpenClaw). Misses the slang entirely. | Again "小龙虾" stands for OpenClaw; literal "crawfish" mistranslates the core concept. |
| google/gemma-3-1b-it | As a programmer, I haven't set up a server yet... |
严重
[准确性]
"As a programmer, I haven't set up a server yet..."
理由: Severe under-translation and terminology error. Misses '小龙虾' (OpenClaw) slang, translating it implicitly as 'server' or omitting it. Omits the entire second sentence about CPA and free coding. | Incomplete translation - only partial title translation. Should be 'As a programmer, I haven't deployed OpenClaw yet'. Missing content section about CPA and Codex account pool. | Misrepresents the title which is about not having deployed OpenClaw (“小龙虾”). Also omits the rest of the content about servers, CPA, and using Codex account pools. |
| google/gemma-3-4b-it | As a programmer, I haven't even set up a crayfish server yet. |
严重
[准确性]
"crayfish server"
理由: Failed to translate slang '小龙虾' as 'OpenClaw'. 'Crayfish server' is a literal and incorrect interpretation. | “小龙虾” is slang for OpenClaw, not actual crayfish; meaning is distorted. | Should be 'As a programmer, I haven't deployed OpenClaw yet' not 'crayfish server' |
| google/translategemma-12b-it | Title: As a programmer, I haven't set up a "xiaolongxia" server yet. |
严重
[术语]
""xiaolongxia" server"
理由: "小龙虾" is slang for OpenClaw; leaving it as untranslated nickname without explaining may be ambiguous, but more importantly it fails to map to the established term OpenClaw used in context/reference. | Incorrect terminology: 'xiaolongxia' (pinyin) should be 'OpenClaw' - in this context, 小龙虾 is slang for OpenClaw (AI coding tool), not actual lobster; also using pinyin instead of English is inconsistent | Should be 'OpenClaw' instead of 'xiaolongxia'. |
| google/translategemma-4b-it | Title: As a programmer, I haven't set up a lobster server yet |
严重
[准确性]
"lobster server"
理由: Literal translation of '搭建过小龙虾'. Should be 'deployed OpenClaw'. | Mistranslation - '小龙虾' is slang for OpenClaw, not literal 'lobster'. Should be 'As a programmer, I haven't deployed OpenClaw yet'. | "小龙虾" here is slang for OpenClaw; literal "lobster" misleads the reader into thinking it's about an animal-related server instead of deploying the tool. | The translation incorrectly translates 'OpenClaw' as 'lobster'. |
| tencent/HY-MT1.5-1.8B | Translation of '作为一名程序员还没搭建过小龙虾' in en |
严重
[准确性]
"Translation of '作为一名程序员还没搭建过小龙虾' in en"
理由: Placeholder; fails to provide actual translation about not having deployed OpenClaw as a programmer. | The hypothesis is a meta-description of the task, not the actual translation of the source text. | Hypothesis is not a translation but a placeholder text indicating translation request |
| tencent/HY-MT1.5-7B | As a programmer, I haven't set up a crayfish yet. |
严重
[准确性]
"crayfish"
理由: Source uses '小龙虾' (crayfish) as slang for OpenClaw, but hypothesis translates it literally as 'crayfish' instead of 'OpenClaw' | Literal translation of slang '小龙虾'. Refers to deploying 'OpenClaw', not an animal. | "小龙虾" is slang for OpenClaw; rendering as crayfish changes the topic from a deployment/tool to animals. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Can’t use Dalu Longxia on PC? [Version issue] |
致命
[准确性]
"Dalu Longxia"
理由: Should be 'OpenClaw'. '大龙虾' is slang for OpenClaw, not 'Dalu Longxia'. | 原文“大龙虾”是对 OpenClaw 的昵称,音译为“Dalu Longxia”既不通又丢失 OpenClaw 的指代,严重影响理解。 | '大龙虾' is slang for OpenClaw. The hypothesis transliterates it as 'Dalu Longxia', which is meaningless to an English reader. It should be translated as OpenClaw or explained. |
| CohereLabs/tiny-aya-water | Content: Is this a cut-down version or is my posture wrong? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Can't use Lobster in QQ for PC? [Version Issue] Content: Is it a castrated version, or am I doing something wrong? |
严重
[准确性]
"Lobster"
理由: Failed to translate slang '大龙虾' (OpenClaw). | "大龙虾" here refers to OpenClaw; translating as literal "Lobster" loses the product meaning. | Misses the slang. Should be 'Can't the PC version of QQ use OpenClaw?' |
| Qwen/Qwen2.5-7B-Instruct | PC Version of QQ Not Working with LinuxDo? [Version Issue] |
轻微
[准确性]
"LinuxDo"
理由: Completely wrong entity; source refers to OpenClaw, not LinuxDo. | Incorrect translation. Should be 'Can't the PC version of QQ use OpenClaw?' The '大龙虾' refers to OpenClaw, not LinuxDo. | The translation is accurate but could be more precise. | Hallucination/Entity Error. The source says '大龙虾' (slang for OpenClaw). The hypothesis translates it as 'LinuxDo' (the name of the forum), which changes the meaning completely. The question is about using OpenClaw with QQ, not using the forum itself. |
| Qwen/Qwen3-14B | Non si può usare Da Long Xie sulla versione PC di QQ? [Problema della versione] | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Can't use Spiny Lobster on PC version of QQ? [Version Issue] Content: Is it a censored version, or am I doing something wrong? |
严重
[准确性]
"Spiny Lobster"
理由: Should be 'OpenClaw'; 'Spiny Lobster' is incorrect translation of the slang term | Failed to translate the slang '大龙虾' (Big Lobster) which refers to 'OpenClaw'. The translation implies a biological animal. | “大龙虾” in this community context is a playful name for OpenClaw, not actual spiny lobster; should map to OpenClaw to preserve intended reference. |
| Qwen/Qwen3.5-9B | Title: Can't use Big Lobster on PC version QQ? [Version Issue] Content: Is it a crippled version, or am I doing something wrong? |
严重
[准确性]
"Big Lobster"
理由: "大龙虾" is slang for OpenClaw; literal "Big Lobster" is misleading for the intended audience. | Should be 'OpenClaw' not 'Big Lobster'; misses slang meaning | Literal translation of slang '大龙虾'. Refers to 'OpenClaw'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Is PC QQ unable to use Big Prawn? [Version issue] |
严重
[准确性]
"Big Prawn"
理由: The hypothesis translates '大龙虾' as 'Big Prawn' but should be 'OpenClaw'. Also the phrasing is awkward; should be 'Can't the PC version of QQ use OpenClaw?' | The hypothesis only translates the title and omits the content text ('阉割版吗,还是我姿势不对?'). | ‘大龙虾’ is OpenClaw in context; translating as ‘Big Prawn’ treats it literally and loses the product/service meaning.
严重
[术语]
"Big Prawn"
理由: Incorrect translation - '大龙虾' is a nickname for OpenClaw, not literal big prawn. Should be 'OpenClaw' | The term '大龙虾' should be translated as 'OpenClaw'. | Mistranslation of '大龙虾'. Refers to 'OpenClaw', not literal prawn. |
| google/gemma-3-12b-it | Title: Can I use Lobster on the PC version of QQ? [Version Issue] Content: Is it a cut-down version, or am I doing something wrong? |
致命
[准确性]
"Lobster"
理由: "大龙虾" in this forum context refers to OpenClaw; translating as literal "Lobster" loses the product name meaning. | Misses slang meaning. '大龙虾' refers to OpenClaw, not lobster. Should be 'use OpenClaw'. | Failed to translate '大龙虾' as 'OpenClaw'. Literal interpretation makes the sentence nonsensical (using a crustacean on QQ). |
| google/gemma-3-1b-it | "en": "PC version of QQ can't be used with big shrimp?", "zh-TW": "阉割版吗,还是我姿势不对?", "ja": "阉割版ですか、それとも姿勢が間違っていますか?", "ko": "阉切り版인가요, 아니면 자세가 틀렸나요?", "de": "Ist eine Modifizierte Version von QQ nicht erlaubt?“, "fr": "Version modifiée de QQ ne peut pas être utilisée avec des crevettes géantes ?", "it": "Versione modificata di QQ non può essere utilizzata con gamberi giganti?" } |
致命
[准确性]
""en": "PC version of QQ can't be used with big shrimp?","
理由: “big shrimp” is a literal rendering that should be “OpenClaw” in this slang context. | Invalid format - hypothesis is a JSON object with multiple languages instead of a single English translation. Should be 'Can't the PC version of QQ use OpenClaw? [Version issue]'. Missing content section. | The model outputted a JSON object with multiple languages instead of a simple English translation. Also mistranslates '大龙虾' (OpenClaw slang) as 'big shrimp'. |
| google/gemma-3-4b-it | Title: PC Version QQ Can’t Use Big Lobster? [Version Issue] Content: Is it censored, or am I doing it wrong? |
严重
[准确性]
"Big Lobster"
理由: Failed to translate slang '大龙虾' as 'OpenClaw'. | “大龙虾” here refers to OpenClaw, not literal lobsters; product/system name is mistranslated. | Should be 'OpenClaw' not 'Big Lobster' |
| google/translategemma-12b-it | Title: Can I use Lobster on the PC version of QQ? [Version Issue] Content: Is it a stripped-down version, or am I doing something wrong? |
严重
[术语]
"Can I use Lobster on the PC version of QQ?"
理由: Incorrect terminology: 'Lobster' should be 'OpenClaw' - in this context, 大龙虾 is slang for OpenClaw (AI coding tool), not actual lobster | Should be 'OpenClaw' instead of 'Lobster'. | "大龙虾" is slang for OpenClaw; literal "Lobster" loses the intended software meaning. |
| google/translategemma-4b-it | Title: Can't use Lobster on PC version? [Version issue] |
严重
[准确性]
"Lobster"
理由: The translation incorrectly translates 'OpenClaw' as 'Lobster'. | Mistranslation - '大龙虾' is slang for OpenClaw, not literal 'Lobster'. Should be 'Can't the PC version of QQ use OpenClaw? [Version issue]'. | "大龙虾" in this context is a nickname for OpenClaw; literal "Lobster" loses the software/tool reference and confuses meaning. | Literal translation of '大龙虾'. Should be 'use OpenClaw'. |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: Hypothesis is not a translation of the source content but a completely unrelated placeholder text 'Login' | The hypothesis is a meta-description referencing '登录' (Login), which is not the source text. It fails to translate the actual source about QQ and OpenClaw. | Source does not contain '登录'; hypothesis is unrelated to the issue of using OpenClaw on PC QQ. |
| tencent/HY-MT1.5-7B | Title: Can't Use LinuxDo on PC Version of QQ? [Version Issue] |
严重
[准确性]
"LinuxDo"
理由: The original "大龙虾" refers to OpenClaw (a tool/service), not LinuxDo; this changes the product being discussed. | Severe hallucination/mistranslation. The source says '大龙虾' (Big Lobster/OpenClaw). The hypothesis translates it as 'LinuxDo' (the name of the forum community), which changes the meaning entirely. It should be 'OpenClaw'. | Hypothesis introduces 'LinuxDo' which is not in the source; source uses '大龙虾' (big lobster) as slang for OpenClaw |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Google has unblocked my account, I must knowingly violate rules, refuse to change, and continue reverse proxying | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Google has unblocked my account, I must knowingly violate rules, refuse to change, and continue reverse proxying | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Google has unblocked my account, I will deliberately continue to use reverse proxy despite knowing it's wrong Content: I submitted the form, and Google has unblocked my account; I intend to continue using reverse proxy to test if this account gets permanently banned |
轻微
[风格]
"deliberately continue to use reverse proxy despite knowing it's wrong"
理由: Wordy expansion of '明知故犯'. Reference is punchier. | Paraphrases "明知故犯,死不悔改" in a more formal tone; semantic content preserved but loses rhetorical emphasis. | Reference says 'I will knowingly violate the rules, refuse to repent, and continue reverse proxying' - more complete translation |
| Qwen/Qwen2.5-7B-Instruct | Title: Google unfroze my account, I want to knowingly commit the offense and never repent, continue using reverse proxy Content: I submitted the form, Google unfroze my account, I want to continue using reverse proxy, test if this account will be permanently banned again | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Google ha sbloccato il mio account, voglio violare intenzionalmente le regole e non pentirmi mai, continuare ad usare il reverse proxy | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Google unblocked my account; I will knowingly break rules and refuse to repent, continuing reverse proxy. Content: I submitted a form, Google unblocked my account, I will continue reverse proxying, testing whether this account gets banned again, or if it's a permanent ban. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Google unblocked my account; I intend to knowingly violate the rules and refuse to repent, continuing with reverse proxy usage. Content: I submitted the form, Google unblocked my account, and I plan to continue using reverse proxy to test if this account gets banned again, whether permanently. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Google has unblocked my account, but I know better next time and will continue reverse proxying. |
严重
[准确性]
"but I know better next time"
理由: Incorrect translation - '明知故犯' means 'to knowingly violate (rules)' not 'know better next time'. The reference correctly translates this as 'knowingly violate the rules' | The hypothesis misses the key meaning. '明知故犯,死不悔改' means 'knowingly violate the rules and refuse to repent', not 'know better next time'. The hypothesis contradicts the source's intent. | Original expresses ‘will knowingly violate (明知故犯), refuse to repent’; ‘I know better next time’ implies the opposite attitude, reversing meaning. | Mistranslation of '明知故犯'. The source means 'knowingly commit the offense again'. The hypothesis 'know better next time' implies learning from mistakes and behaving, which is the opposite of the source meaning. |
| google/gemma-3-12b-it | Title: Google unbanned my account; I want to knowingly commit offenses and remain steadfastly unrepentant, continuing with reverse proxy. Content: I submitted a form, and Google unbanned my account. I want to continue using a reverse proxy to test whether this account will be banned again and if the ban will be permanent. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '谷歌解封了我的账号' in en |
致命
[准确性]
"Translation of '谷歌解封了我的账号' in en"
理由: Meta-text only; does not provide the actual translation (“Google unbanned my account”). | The translation is incomplete and does not provide the actual content of the source text. | The model outputted a meta-statement instead of performing the translation. | Not a translation - this is a meta-description. Should provide actual translation of the full title and content about Google unbanning the account. |
| google/gemma-3-4b-it | Title: Google Unblocked My Account, I Will Intentionally Sin and Not Repent, Continuing Reverse Proxying Content: I submitted the form, Google unblocked my account, I will continue reverse proxying to test if the account will be permanently banned again. |
轻微
[风格]
"Intentionally Sin and Not Repent"
理由: Overly literal/biblical; more natural would be “knowingly break the rules and refuse to repent” to match context and tone. | Reference says 'I will knowingly violate the rules, refuse to repent' - slightly different phrasing but similar meaning | '明知故犯' is idiomatically 'knowingly violate the rules'. 'Sin' has religious connotations that don't fit the technical context. |
| google/translategemma-12b-it | Title: Google has unbanned my account. I want to knowingly commit offenses and refuse to change, continuing to use reverse proxy. Content: I submitted a form, and Google has unbanned my account. I want to continue using reverse proxy to test whether this account will be banned again and if the ban will be permanent. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Google has unblocked my account, I will knowingly continue reverse proxying, and not change my ways, to test if the account will be permanently blocked again. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Google has unlocked my account. I want to act knowingly and without regret, and continue with reverse proxying. |
轻微
[风格]
"I want to act knowingly and without regret, and continue with reverse proxying."
理由: The phrase 'act knowingly and without regret' is a literal and slightly awkward translation of '明知故犯,死不悔改' (knowingly violate rules and refuse to repent). 'Knowingly violate the rules and refuse to repent' (as in reference) captures the defiant tone better. | Slightly stiff and literal; "act knowingly" is awkward. Meaning is preserved but style is less natural than the reference. | Hypothesis is in English but target language is zh; however translation is semantically close to reference |
| tencent/HY-MT1.5-7B | "en": "Title: Google has unblocked my account, but I will deliberately violate the rules and continue to use reverse proxy.", "zh-TW": "标题:谷歌已解封我的账户,但我仍将故意违反规则并继续使用反向代理。", "ja": "タイトル:Googleが私のアカウントを解除したが、わざと規則に違反してリバースプロキシを続けて使用する。」, "ko": "제목: Google이 내 계정을 해제했지만, 의도적으로 규칙을 위반하고 역방향 프록시를 계속 사용할 것입니다。", "de":="Titel: Google hat mein Konto freigegeben, aber ich werde absichtlich gegen die Regeln verstoßen und weiterhin Reverse-Proxy verwenden。", "fr":="Titre : Google a débloqué mon compte, mais je continuerai à utiliser la reverse proxy en violation des règles。", "it":="Titolo: Google ha rilasciato il mio account, ma continuerò a utilizzare la reverse proxy nonostante le regole."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Open Source Introduction】FeedbackHound - Catches Reddit Posts and Generates Chinese Analysis Reports by AI | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: [Open Source Introduction] FeedbackHound - Catches Reddit Posts + AI Generates Chinese Analysis Reports | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [Open Source Recommendation] FeedbackHound - Capture Reddit Posts + AI Generated Chinese Analysis Report Content: Hello everyone, I'm a beginner coder who has developed a small tool called FeedbackHound with the help of Claude Code. Sharing it for you all to play around with. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Open Source Recommendation] FeedbackHound — Grab Reddit Posts + AI Generated Chinese Analysis Report Content: Fellow experts, hello! A newbie to vibe coding, I whipped up a little tool called FeedbackHound using Claude Code and would love to share it for everyone to play around with. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: [Open Source Self-promotion] FeedbackHound - Grab Reddit Posts + AI Generate Chinese Analysis Report | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Open Source Recommendation] FeedbackHound — Scrape Reddit posts + Generate Chinese analysis reports with AI Content: Friendly hello to everyone! I'm a beginner at vibe coding and built a small tool called FeedbackHound using Claude Code. Sharing it for you all to try out. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Open Source Recommendation] FeedbackHound — Scrape Reddit Posts + AI-Generated Chinese Analysis Reports Content: Hello everyone! I'm a beginner at vibe coding and built this small tool called FeedbackHound using Claude Code. Sharing it with you all to try out. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Open Source Introduction] FeedbackHound — Grab Reddit Subreddit + AI Generate Chinese Analysis Report |
严重
[准确性]
"Grab Reddit Subreddit"
理由: Incorrect - 'Reddit 帖子' means 'Reddit posts', not 'Reddit Subreddit'. A subreddit is the forum itself, not the posts | The hypothesis only translates the title and omits the content text ('各位佬友好 Vibe coding 小白一枚...'). | The hypothesis says 'Grab Reddit Subreddit' but should be 'Scrapes Reddit posts'. The source says '抓 Reddit 帖子' which means posts, not subreddits. | Source says ‘抓 Reddit 帖子’ (grab/scrape Reddit posts), not ‘Subreddit’; wording changes object from posts to subreddits and is inaccurate. |
| google/gemma-3-12b-it | Title: [Open Source Self-Introduction] FeedbackHound - Captures Reddit Posts + AI-Generated Chinese Analysis Reports Content: Hello everyone, I'm a newbie coder with a friendly vibe, and I built a small tool called FeedbackHound using Claude Code to share with you all. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Welcome to FeedbackHound - a tool that analyzes Reddit posts using AI. |
严重
[准确性]
"Welcome to FeedbackHound - a tool that analyzes Reddit posts using AI."
理由: Misses that the tool scrapes Reddit posts and generates Chinese analysis reports; adds “Welcome to” which is not in the source and omits “Chinese” in the output description. | The source is a self-recommendation post ('Sharing...'), not a welcome message. 'Welcome to' changes the intent. | Incomplete translation - only partial title translation. Missing full title about scraping Reddit posts and generating Chinese reports, and missing content section about Claude Code. |
| google/gemma-3-4b-it | Title: [Open Source Recommendation] FeedbackHound - Capture Reddit Posts + AI Generated Chinese Analysis Report Content: Fellow seniors, friendly vibe! I’m a coding newbie, and I cobbled together a little tool called FeedbackHound using Claude Code to share with everyone. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Open Source Self-Introduction] FeedbackHound - Captures Reddit posts and generates Chinese analysis reports using AI. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Open Source Recommendation] FeedbackHound - Scraping Reddit posts + AI-generated Chinese analysis report | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '开源自荐' in en |
严重
[准确性]
"Translation of '开源自荐' in en"
理由: Hypothesis is not a translation but a placeholder text indicating translation request | Placeholder; does not give the explicit meaning '[Open Source Self-Recommendation]'. | The hypothesis is a meta-description of the task, not the translation of the source title or content. |
| tencent/HY-MT1.5-7B | Assistant | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Handmade GPT Team Slot 3 Content: Drawing theme: Handmade GPT Team Slot 3 ( ) Prize details: GPT Team location * 3 |
轻微
[术语]
"Slot"
理由: Should be 'seat' or 'spot' to match reference; 'Slot' is less idiomatic | '车位/座位' here refers to subscription seats/spots, not generic 'slot'. | 'Slot' is less precise than 'seat' or 'spot' for a shared subscription context.
严重
[准确性]
"Handmade GPT Team Slot 3"
理由: Mistranslation of '手搓' (slang for manually created/configured) as 'Handmade' implies physical crafting rather than software configuration. | Mistranslation of '车位' as 'Slot' instead of 'shared subscription seat' - the reference correctly translates this as 'shared subscription seats' | Missing 'shared subscription' concept; '手搓' means manually created/handmade but in context refers to shared/pooled subscriptions, not just handmade | Missing 'shared subscription' and uses 'Slot' instead of 'seats'. |
| CohereLabs/tiny-aya-water | Title: Handmade GPT Team Slot 3 |
严重
[准确性]
"Handmade GPT Team Slot 3"
理由: Missing key information: '3' should be at the beginning, and 'shared subscription seats' is more accurate than just 'Slot'. The hypothesis omits 'shared subscription' concept. | The term 'Slot' is not as accurate as 'subscription seats' in the reference. | Mistranslation of '手搓' (slang for manually created/configured) as 'Handmade' implies physical crafting rather than software configuration. | Omitted key information about lottery theme and that there are 3 shared seats/spots; "slot" is vague and misses the idea of team subscription seats. | Translation is incomplete - missing 'shared subscription seats' meaning from source '车位' which refers to shared subscription spots |
| Qwen/Qwen2.5-14B-Instruct | Title: Handcrafted GPT Team Subscription Slots 3 Content: Prize Theme: Handcrafted GPT Team Subscription Slots 3 ( ) Prize Details: GPT Team Slot * 3 | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Handcrafted GPT team slot 3 shares Content: Prize theme: Handcrafted GPT team slot 3 shares ( ) Prize details: gpt team slot * 3 |
严重
[准确性]
"Handcrafted GPT team slot 3 shares"
理由: Mistranslation: '车位' means subscription seats/shared accounts, not 'slot' or 'shares'. Should be 'shared subscription seats' as in reference. | Misinterprets“车位3个”and“共享座位/名额”as“3 shares”; should express three GPT Team seats/slots rather than ownership shares. | Should be '3 manually created GPT Team shared subscription seats' not 'Handcrafted GPT team slot 3 shares'. The term '手搓' means manually created/handmade, and '车位' refers to subscription seats, not generic slots. |
| Qwen/Qwen3-14B | Title: Handmade GPT Team Slot 3 Content: Raffle theme: Handmade GPT Team Slot 3 () Prize details: GPT Team position * 3 |
严重
[准确性]
"GPT Team Slot 3"
理由: The term 'Slot' is less accurate compared to 'shared subscription seats' in the reference. | Translation is too literal. 'Handmade' is not the correct translation of '手搓' which means 'manually created' or 'hand-crafted' in this context of creating GPT Team shared subscription seats. Also 'Slot' should be 'seat' or 'position' to match the reference. | Missing 'shared subscription' concept; '手搓' implies manually created/handmade but loses the shared subscription seat meaning present in reference | Misinterprets ‘车位3个’ as a single ‘Slot 3’ instead of ‘3 slots/seats’; number association is wrong. |
| Qwen/Qwen3.5-4B | Title: Handcrafted GPT Team Slot x3 Content: Prize Theme: Handcrafted GPT Team Slot x3 () Prize Details: gpt team position * 3 |
轻微
[术语]
"Slot"
理由: Less accurate than "seat" or "spot" for subscription positions | Inconsistent translation of '车位' - translated as 'slot' but reference uses 'seats' or 'shared subscription seats' | The term 'Slot' is not as precise as 'shared subscription seats' in the reference.
轻微
[术语]
"gpt team position * 3"
理由: Inconsistent translation of '位置' - translated as 'position' but reference uses 'spots' | In the context of shared accounts, 'position' is less idiomatic than 'spot' or 'seat'. | Inconsistent capitalization and less natural than "GPT Team spots * 3" | Should be 'GPT Team spots' not 'gpt team position' for consistency and clarity |
| Qwen/Qwen3.5-9B | Title: Handcrafted GPT Team Slots (3) Content: Draw Theme: Handcrafted GPT Team Slots (3) Prize Details: GPT Team Seats * 3 |
轻微
[术语]
"Handcrafted GPT Team Slots (3)"
理由: Inconsistent translation of '车位' - translated as 'slots' but reference uses 'shared subscription seats' | In the context of shared accounts, 'seats' or 'spots' is more standard than 'slots'. | Should be 'manually created' or 'shared subscription seats' rather than 'Handcrafted GPT Team Slots' to match reference terminology |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Hand-rolled GPT Team Subscription Slot 3 |
严重
[准确性]
"3"
理由: 中文“3个”在标题中更自然译为“3 seats”而非尾部裸“3”,略显不自然且可能让数量指代不够清晰 | Missing the quantity '3' in the title structure compared to source '车位 3 个' (3 slots), and 'Hand-rolled' is a slightly awkward translation for '手搓' in this context compared to 'manually created' or 'self-built'. Also missing 'Content' translation entirely. | Translation is incomplete - missing 'shared subscription seats' meaning and '3' should be at the beginning not end of title | The translation does not fully capture the meaning of 'shared subscription seats' in the reference. | Missing quantifier '3' at the end; should be 'Hand-rolled GPT Team Subscription Slots 3' or similar. Also 'Subscription Slot' is less accurate than 'shared subscription seats' |
| google/gemma-3-12b-it | Title: Handcrafted GPT Team Slots - 3 Available Content: Raffle theme: Handcrafted GPT Team Slots - 3 Available ( ) Prize details: gpt team slots * 3 |
轻微
[术语]
"Handcrafted GPT Team Slots"
理由: While 'Slots' is acceptable, 'Seats' or 'Shared subscription seats' (as in reference) better captures the meaning of '车位' in the context of shared account access. | Inconsistent translation of '车位' - translated as 'slots' but reference uses 'seats' or 'spots'. Also 'Handcrafted' is not the standard translation of '手搓' in this context (refers to manually created/shared subscription seats). | Should be 'manually created' or 'hand-rolled' rather than 'Handcrafted'; 'Slots' is less precise than 'seats' or 'shared subscription seats' |
| google/gemma-3-1b-it | Translation of '手搓的GPT team车位3个' in en |
严重
[准确性]
"Translation of '手搓的GPT team车位3个' in en"
理由: Missing translation of the content section. | The hypothesis is a meta-commentary describing the task rather than providing the actual translation of the source text. | Hypothesis is incomplete and does not provide actual translation. Should translate to something like '3 manually created GPT Team shared subscription seats' as shown in reference. | Hypothesis is not a translation but a placeholder indicating translation task. Missing actual translation content. | Not an actual translation; it's a meta-statement about translating the phrase and omits the meaning (manually created GPT Team seats/shared subscription seats). |
| google/gemma-3-4b-it | Title: Handcrafted GPT Team Slots - 3 Seats | 通过 (无共识错误) |
| google/translategemma-12b-it | "en": "Title: 3 Slots Available for the Manually Curated GPT Team\nContent: Prize Draw Theme: 3 Slots Available for the Manually Curated GPT Team ( ) \nPrizes: gpt team slots * 3", "zh-TW": "標題:人工精選 GPT 團隊的 3 個車位\n內容:抽獎主題:人工精選 GPT 團隊的 3 個車位()\n獎品詳情:gpt 團隊名額 * 3", "ja": "タイトル:手作業で選んだGPTチームの3つの枠。\n内容:抽選テーマ:手作業で選んだGPTチームの3つの枠。()\n賞品詳細:gptチームの枠 * 3", "ko": "제목: 수동으로 선별된 GPT 팀의 3개의 자리.\n내용: 추첨 주제: 수동으로 선별된 GPT 팀의 3개의 자리. (\)\n상품 상세: gpt 팀 자리 * 3", "de": "Titel: 3 Plätze für das manuell kuratierte GPT-Team verfügbar.\nInhalt: Gewinnspielthema: 3 Plätze für das manuell kuratierte GPT-Team ().\nPreisdetails: gpt team Plätze * 3", "fr": "Titre : 3 places disponibles pour l'équipe GPT sélectionnée manuellement.\nContenu : Thème du tirage au sort : 3 places disponibles pour l'équipe GPT sélectionnée manuellement ().\nDétails des prix : emplacements d'équipe gpt * 3", "it": "Titolo: 3 posti disponibili per il team GPT selezionato manualmente.\nContenuto: Tema della lotteria: 3 posti disponibili per il team GPT selezionato manualmente ().\nDettagli del premio: posti nel team gpt * 3" } <|file_separator|> | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Handcrafted GPT Team Slots - 3 |
轻微
[风格]
"Handcrafted GPT Team Slots - 3"
理由: The term 'Handcrafted' is too literal for '手搓' in this tech context; 'manually created' or 'DIY' would be more appropriate. | 'Handcrafted' is a literal translation of '手搓' but in this context 'manually created' or 'self-created' would be more appropriate for the slang meaning | Unnecessary capitalization and slightly awkward phrasing; "GPT team" is more natural lowercase and "car parking slots" or "seats" would better match "车位"/"shared subscription seats".
严重
[准确性]
"Handcrafted GPT Team Slots - 3"
理由: Translation is incomplete - missing the 'Content' field which contains the lucky draw theme and prize details. The reference includes both Title and Content, but hypothesis only has Title. | Missing key concept '共享订阅座位' (shared subscription seats). The hypothesis only translates '手搓' as 'Handcrafted' and '车位' as 'Slots' but loses the 'shared subscription' meaning present in reference. | The translation is vague compared to the reference which specifies 'manually created' and 'shared subscription seats.' |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 手搓的GPT team车位3个' in en |
致命
[准确性]
"Translation of 'Title: 手搓的GPT team车位3个' in en"
理由: Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | The hypothesis does not provide a translation but only states the intention to translate. | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be '3 manually created GPT Team shared subscription seats'. | No actual translation is provided; the text only describes that a translation is needed. |
| tencent/HY-MT1.5-7B | "en": "Title: 3 Hand-Crafted GPT Team Parking Spots", "zh-TW": "标题:3个手工制作的GPT团队停车位", "ja": "标题:3つの手作りGPTチーム用駐車スペース", "ko": "제목: 3개의 수작업 GPT 팀용 주차 공간", "de":="Titel: 3 handgefertigte GPT-Team-Parkplätze", "fr": "Titre: 3 places de stationnement pour l'équipe GPT faites à la main", "it": "Titolo: 3 posti auto per il team GPT realizzati a mano"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: How to use Wong Public Station in opencode? Content: As the title says, directly using will result in an error... I hope someone who has messed around with it can share some tutorials |
严重
[术语]
"Public Station"
理由: '公益站' in this community refers to a free API proxy/site, not a generic 'public station'. | Missing possessive apostrophe and 'free API proxy'. | Incorrect translation of 'wong 公益站' - should be 'Wong's free API proxy site' per reference, not 'Wong Public Station' |
| CohereLabs/tiny-aya-water | Title: What is Wong's公益站 and how does it work? |
严重
[准确性]
"What is Wong's公益站 and how does it work?"
理由: Complete hallucination; the source asks 'How to use', not 'What is' or 'how does it work'. Also fails to translate '公益站'. | Incorrect translation - source asks 'how to use' not 'what is' - missing '如何在opencode中使用' which means 'how to use in OpenCode' | Significantly deviates from source. Source asks 'how to use' not 'what is'. Mixes Chinese and English inappropriately. Should be 'How to use Wong's free API proxy site in OpenCode?' | Source asks how to use Wong’s public/free site in OpenCode, not what it is or how it works in general; also omits reference to OpenCode and error situation. |
| Qwen/Qwen2.5-14B-Instruct | Title: How to use Wong's free shared API site in opencode? Content: As the title suggests, an error occurs when used directly... Hoping someone experienced can share some tutorials. |
轻微
[术语]
"free shared API site"
理由: Inconsistent terminology: 'free shared API site' vs reference 'free API proxy site'. Also 'opencode' should be 'OpenCode' (proper capitalization). | Original “公益站” in this context refers to a free API proxy site, not just a generic shared API site. | '公益站' in this specific community context refers to free API proxy sites, not just shared sites. |
| Qwen/Qwen2.5-7B-Instruct | Title: How to Use Wong Public Proxy on OpenCode? Content: As stated, an error occurs when used directly... Hoping that experienced enthusiasts can share some tutorials. |
轻微
[术语]
"Public Proxy"
理由: 'Public Proxy' is generic; 'free API proxy site' captures the specific community meaning of '公益站' better. | Source“wong 公益站”in this context refers to a free API proxy site, not a generic public proxy; established forum term is“free API proxy site”or“free API site”. | Inconsistent terminology: 'Public Proxy' vs reference 'free API proxy site'. Also missing possessive 's' in 'Wong's'. |
| Qwen/Qwen3-14B | Come si utilizza il sito gratuito di Wong in opencode? |
严重
[准确性]
"Come si utilizza il sito gratuito di Wong in opencode?"
理由: Title only; the source also has body text describing errors when using it directly and asking for a tutorial. That content is omitted. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. Should be in English or Chinese. | The translation does not fully capture the context of 'API proxy site' as in the reference. | The hypothesis only translates the title and completely omits the content paragraph. |
| Qwen/Qwen3.5-4B | Title: How to use Wong's free shared API site on opencode? Content: As stated above, using it directly results in an error... Hope someone who has experimented with this can share a tutorial. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: How to use the free shared API site (wong) on opencode? Content: As asked, when used directly, an error occurs... Hope someone who has experimented can share a tutorial. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | "en": "How to use Wong's Public Proxy in opencode?", "zh-TW": "如何在 Opencode 中使用 Wong 的 公益站?", "ja": "Opencode で Wong の 公益站 を 使用する 方法", "ko": "如何在 Opencode 中使用 Wong 的 公益站?", "de": "Wie man Wong\'s Public Proxy in Opencode verwenden kann?", "fr": "Comment utiliser la station publique de Wong dans Opencode ?", "it": "Come utilizzare la stazione pubblica di Wong in Opencode?" } <|im_end|> <|im_end|> |
严重
[准确性]
"How to use Wong's Public Proxy in opencode?"
理由: Hypothesis contains multiple language translations when only English translation was requested. Also 'Public Proxy' is less accurate than 'free API proxy site' | Incorrect translation - '公益站' means 'free API proxy site' not 'Public Proxy' | The hypothesis outputs a JSON object containing multiple languages instead of translating the source text into the target language (zh) as requested. It fails to perform the translation task. |
| google/gemma-3-12b-it | Title: How to use wong public proxy in opencode? |
轻微
[术语]
"wong public proxy"
理由: '公益站' is better translated as 'free API proxy site' (as in reference) rather than just 'public proxy', which lacks the specific context of free API access. | 公益站 here refers to a free API proxy site; omitting 'API' and 'site' under-translates the technical meaning compared to the intended 'free API proxy site'. | Should be 'Wong's free API proxy site' not 'wong public proxy'; missing possessive and specificity | Missing possessive 's' in 'Wong's' - reference uses 'Wong's free API proxy site'. Also 'public proxy' is less accurate than 'free API proxy'. | The term 'proxy' is used instead of 'API proxy site', which is less precise. |
| google/gemma-3-1b-it | As per the title, directly using it will result in an error...I hope there are some experienced folks who have worked on this, sharing some tutorials. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: How to use the wong public proxy in opencode? |
轻微
[术语]
"public proxy"
理由: Source "公益站" in this community usually refers to a free API/proxy site; "public" is vague and misses the nuance of free/shared API proxy. | The term 'public proxy' is less accurate than 'free API proxy site' in the reference. | Incorrect translation: 'wong' should be transliterated as 'Wong' (name), and 'public proxy' should be 'free API proxy site' to match the reference terminology |
| google/translategemma-12b-it | Title: How to use the wong public API site in opencode? Content: As the title suggests, using it directly will result in an error... I hope experienced users can share some tutorials. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: How to use the wong public API site in opencode? |
严重
[准确性]
"public API site"
理由: Failed to translate '公益站' correctly as 'free API proxy site' or similar; 'public API site' misses the specific community slang meaning. | Source "wong 公益站" and reference indicate a free API proxy site, not a generic "public API site"; misses the proxy/provided-for-free nuance. | Incorrect translation of 'wong'. Should be 'Wong's free API proxy site' not 'wong public API site'. The reference clarifies it's a proper noun (Wong's) and specifically a 'free API proxy site'. |
| tencent/HY-MT1.5-1.8B | Translation of '如何在opencode 中使用 wong 公益站?' in en |
轻微
[准确性]
"Translation of '如何在opencode 中使用 wong 公益站?' in en"
理由: Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'How to use Wong's free API proxy site in OpenCode?'. | No translation content is given, only a meta description requesting a translation. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. |
| tencent/HY-MT1.5-7B | How to use the Wong Public Welfare Site in OpenCode? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Free API Site】ztu.ai video generation online |
轻微
[术语]
"【Free API Site】"
理由: '公益站' is commonly rendered as 'free API proxy site'; 'Free API Site' loses the proxy nuance. | Inconsistent terminology - 'Free API Site' vs reference 'Free API proxy site' - missing 'proxy' | '公益站' is better translated as 'free API proxy site' in this specific community context. |
| CohereLabs/tiny-aya-water | Content: Upload an image and describe your action for ztu.ai's new video generation feature |
严重
[准确性]
"Upload an image"
理由: Missing Title field entirely - source has Title '[公益站]ztu.ai 视频生成上线' but hypothesis only has Content | Incomplete translation. Missing 'one or two images' specification and 'fill in the action description' is more accurate than 'describe your action'. Also lacks title translation. | The reference specifies 'one or two images', which is missing here. | Source specifies uploading one or two images; hypothesis allows only a single image. | Inaccuracy; source specifies 'one or two images' (一张到两张图片), hypothesis simplifies to singular. |
| Qwen/Qwen2.5-14B-Instruct | [Sito gratuito] lancio della generazione video su ztu.ai |
致命
[准确性]
"[Sito gratuito] lancio della generazione video su ztu.ai"
理由: Hypothesis is in Italian, not English. Should be translated to English as '[Free API proxy site] ztu.ai video generation is online' | Target is Italian, not the requested Chinese, and omits core content (text-to-video, image-to-video, upload images, fill action description). | The translation is in Italian instead of Chinese, which is a critical error. | The translation is in Italian, but the target language is Chinese (zh). The entire content is missing. | The hypothesis is in Italian, not English. The source and reference are in Chinese, and the target language is zh (Chinese). This is a complete mistranslation. |
| Qwen/Qwen2.5-7B-Instruct | Title: [Free Shared API Site] ztu.ai Video Generation Launched Content: ztu.ai Adds Text-to-Video and Image-to-Video Features Upload one to two images, fill in action description |
轻微
[术语]
"Free Shared API Site"
理由: Source“公益站”in this forum context is better rendered as“free API proxy site”;“Shared API Site”is understandable but not standard for this domain. | Terminology mismatch: 'Free Shared API Site' vs reference 'Free API proxy site'. The term '公益站' specifically refers to free API proxy sites in this community context. | Slightly verbose. 'Free API proxy site' is the established term in this context. |
| Qwen/Qwen3-14B | 【Sito gratuito】ztu.ai Generazione video lanciata |
轻微
[准确性]
"【Sito gratuito】"
理由: Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. Should be in English or Chinese. | The hypothesis only translates the title and omits the entire content paragraph describing features. | ‘公益站’ here refers to a free API proxy site, not just a generic ‘free site’; ‘API proxy’ nuance is missing. |
| Qwen/Qwen3.5-4B | Title: [Free Shared API Site] ztu.ai Video Generation Live Content: ztu.ai now offers Text-to-Video and Image-to-Video features. Upload one to two images and enter action descriptions. |
轻微
[术语]
"Free Shared API Site"
理由: '公益站' in this specific tech context usually refers to 'free API proxy sites' rather than just 'shared API sites'. | Inconsistent translation of '公益站' - hypothesis uses 'Free Shared API Site' while reference uses 'Free API proxy site' | 公益站 in this context is better rendered as "free API proxy site"; "shared API" is less standard |
| Qwen/Qwen3.5-9B | Title: [Free Shared API Site] ztu.ai Video Generation Now Live Content: ztu.ai adds text-to-video and image-to-video features. Upload one to two images and enter action descriptions. |
轻微
[术语]
"[Free Shared API Site]"
理由: '公益站' in this context is better translated as 'free API proxy site' to distinguish from general shared APIs. | 原文“公益站”在该语境下通常译为“free API proxy site”,与参考不符,轻微术语偏差。 | Should be 'Free API proxy site' to match reference | Changed 'Free API proxy site' to 'Free Shared API Site' - terminology inconsistency with reference |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Public Proxy] ztu.ai Video Generation Launch |
轻微
[术语]
"[Public Proxy]"
理由: '公益站' is consistently translated as 'Free API proxy site' in other references; 'Public Proxy' is less precise in this specific community context. | 'Public Proxy' is inaccurate; should be 'Free API proxy site' or similar. 'Launch' is less precise than 'is online' | 参考将“公益站”更具体译为“[Free API proxy site]”,‘Public Proxy’过于笼统,未体现API proxy站点含义 |
| google/gemma-3-12b-it | Title: [Free API Site] ztu.ai Video Generation Launched Content: ztu.ai has added text-to-video and image-to-video features. Upload one to two images and fill in action descriptions. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '【公益站】ztu.ai 视频生成上线' in en |
严重
[准确性]
"Translation of '【公益站】ztu.ai 视频生成上线' in en"
理由: Missing translation of the content section. | The hypothesis is a meta-commentary describing the task rather than providing the actual translation of the source text. | Hypothesis is incomplete and does not provide actual translation. Should translate to '[Free API proxy site] ztu.ai video generation is online' as shown in reference. | Not a translation but a meta-text; does not render the meaning of the Chinese title or mention free API proxy site or video generation going online. | Hypothesis is not a translation but a placeholder indicating translation task. Missing actual translation content. |
| google/gemma-3-4b-it | Title: [Free Shared API Site] ztu.ai Video Generation Launched Content: ztu.ai added text-to-video and image-to-video features. Upload one to two images and fill in the action description. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Public Proxy] ztu.ai Video Generation is Now Available Content: ztu.ai has added text-to-video and image-to-video features. Upload one to two images and fill in the action description. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: 【Free Shared API Site】ztu.ai Video Generation Launched |
轻微
[术语]
"【Free Shared API Site】"
理由: 'Free API proxy site' is the consistent terminology used in the reference for '公益站'. | 'Free Shared API Site' is a literal translation. The reference uses 'Free API proxy site' which is more idiomatic. The term '公益站' in this context means 'free API proxy site' not 'free shared API site' | Translated as 'Free Shared API Site' but reference uses 'Free API proxy site'. While similar, 'proxy site' is more accurate terminology. |
| tencent/HY-MT1.5-1.8B | Translation of '公益站】ztu.ai 视频生成上线' in en |
轻微
[准确性]
"Translation of '公益站】ztu.ai 视频生成上线' in en"
理由: The hypothesis does not contain a translation, only a meta reference to translation. | Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be '[Free API proxy site] ztu.ai video generation is online'. |
| tencent/HY-MT1.5-7B | Title: [Public Welfare Site] ztu.ai Video Generation Service Launched | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Gemma 4b does not fall in love with the mysterious you circle black dialect | IF. Data Practical Test |
严重
[准确性]
"Gemma 4b does not fall in love with the mysterious you circle black dialect"
理由: Complete failure to translate the slang '聪明哈基米' (Smart Hakimi/cute cat) and '你圈黑话' (your circle's slang/jargon). The output is nonsensical. | Incorrect translation of '聪明哈基米' and context. | Mistranslation of '聪明哈基米'; should reference 'Smart Hakimi' (a cute cat reference), not just 'Gemma 4b' | Mistranslation of '哈基米' as 'Gemma' - reference correctly identifies it as 'Hakimi (cute cat)'; also '你圈' translated as 'you circle' instead of 'your circle' | Misparses '聪明哈基米' (a nickname, not Gemma 4b) and '你圈黑话' (your circle's slang); overall meaning is distorted.
严重
[准确性]
"missing explicit 'TranslateGemma 4b' in title"
理由: Title omits 'TranslateGemma 4b' which is part of the source title | The model name 'TranslateGemma 4b' is part of the title in the source but is not clearly preserved at the end. | Missing 'TranslateGemma' in hypothesis - reference shows full name 'TranslateGemma 4b' |
| CohereLabs/tiny-aya-water | Title: Clever Hakiemi Won't Fall for Mysterious Your Circle Black Talk | IF. Data Practical Test TranslateGemma 4b |
严重
[准确性]
"Clever Hakiemi"
理由: Mistranslation of '哈基米': should be 'Hakimi (cute cat)' not 'Hakiemi'. 'Fall for' is incorrect; should be 'fall in love with'. Grammar is awkward. | Name "Hakimi" is mistransliterated as "Hakiemi". | Incorrect translation of '哈基米' - this is internet slang for 'cute cat/hedgehog' not a name 'Hakiemi' |
| Qwen/Qwen2.5-14B-Instruct | Title: Clever Hakimi Wouldn't Fall for Mysterious You Circle Slang | IF. Data Practical Evaluation TranslateGemma 4b Content: Clever Hakimi Wouldn't Fall for Mysterious You Circle Slang | IF. Data Practical Evaluation TranslateGemma 4b |
严重
[准确性]
"Mysterious You Circle Slang"
理由: '你圈' is slang for 'your circle' (the specific community the user belongs to). Translating it as 'You Circle' makes it sound like a proper noun or a strange entity. | Missing context that 'Hakimi' refers to a cute cat; should be 'Smart Hakimi (cute cat)' or similar | “神秘你圈黑话” means “your circle’s mysterious slang”; “You Circle” as a proper noun is misleading. |
| Qwen/Qwen2.5-7B-Instruct | Clever Haji米 Will Not Fall for Your Mysterious Lingo | IF. Data Practical Evaluation TranslateGemma 4b |
严重
[准确性]
"Haji米"
理由: Incorrect transliteration of“哈基米(Hakimi)”; should be“Smart Hakimi”to match the name reference in source and reference. | Mistranslation: '哈基米' is a term for cute cats (harking from Japanese), not 'Haji米'. Reference correctly identifies as 'Hakimi (cute cat)'. | Failed to translate the slang term '哈基米' (Hakimi/cute cat), leaving mixed characters. | Should be 'Hakimi' as in the reference. | Incorrect translation. Should be 'Smart Hakimi (cute cat) won't fall in love with your circle's mysterious slang'. The hypothesis misses the context and cultural reference. |
| Qwen/Qwen3-14B | Title: Smart HAKIMI won't fall for the mysterious jargon of your circle | IF. Data Practical Evaluation Translate Gemma 4b Content: Smart HAKIMI won't fall for the mysterious jargon of your circle | IF. Data Practical Evaluation Translate Gemma 4b |
轻微
[准确性]
"jargon of your circle"
理由: Missing context that 'HAKIMI' refers to a cute cat (哈基米); also 'fall for' should be 'fall in love with' to match reference | ‘你圈黑话’ is specifically ‘your circle’s mysterious slang’; ‘jargon’ is close but loses the ‘mysterious’/in-group slang nuance slightly. | Incorrect translation of '神秘你圈黑话'. The reference correctly translates as 'your circle's mysterious slang'. The hypothesis incorrectly translates '你圈' as 'your circle' but the meaning is lost in context. | The term 'HAKIMI' should be lowercase as in the reference. |
| Qwen/Qwen3.5-4B | Title: Smart Hahimi Won't Fall for Your Mysterious Jargon | IF. Data Practical Review TranslateGemma 4b Content: Smart Hahimi Won't Fall for Your Mysterious Jargon | IF. Data Practical Review TranslateGemma 4b |
严重
[准确性]
"Hahimi"
理由: The reference specifies 'Hakimi (cute cat)', which is omitted in the hypothesis. | Should be 'Smart Hakimi (cute cat)' - 'Hahimi' is incorrect transliteration and missing the cat reference | Failed to recognize '哈基米' as a specific internet slang term for cute cats (often associated with a meme), translating it phonetically instead of conveying the meaning or adding context. | Mistransliteration of 哈基米; standard is "Hakimi" | Incorrect translation of '哈基米' - hypothesis uses 'Hahimi' but reference correctly uses 'Hakimi (cute cat)' which is the actual meaning
轻微
[准确性]
"your mysterious jargon"
理由: Should be 'won't fall in love with your circle's mysterious slang' - meaning is significantly different | Misses "your circle's" nuance in 你圈黑话 (in-group slang) and weakens the social context | Missed the nuance of '你圈' (your circle/community), translating it as a generic 'your'. |
| Qwen/Qwen3.5-9B | Smart HakiMi won't fall for mysterious jargon from your circle | IF. Data Practical Evaluation TranslateGemma 4b |
严重
[准确性]
"HakiMi"
理由: Incorrect translation: '哈基米' is a term for cute cats (from Japanese), not 'HakiMi'; reference correctly translates as 'Hakimi (cute cat)' | Fails to translate the slang '哈基米' (Hakimi/cute cat) and '黑话' (slang/jargon) accurately, losing the cultural reference and specific meaning. | 专名应为“Hakimi”,拼写错误但不致误解。 | Missing context that 'Hakimi' refers to a cute cat; should be 'Smart Hakimi (cute cat) won't fall in love with your circle's mysterious slang' | Incorrect spelling of 'Hakimi' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Bright Hackyami Won't Fall for Mysterious You | IF. Data Practice Evaluation TranslateGemma 4b |
严重
[准确性]
"Bright Hackyami Won't Fall for Mysterious You"
理由: Misinterprets '你圈' (your circle/community) as 'You'. Should be 'your circle's mysterious slang'. | The translation does not fully capture the context of 'mysterious slang' in the reference. | Incorrect translation of '哈基米' - should be 'Hakimi' (a cute cat nickname), not 'Hackyami' | 'Hackyami' is unclear; should be 'Hakimi (cute cat)'. 'Mysterious You' is inaccurate; should be 'your circle's mysterious slang' |
| google/gemma-3-12b-it | Title: Clever Hakimi won't fall for your mysterious circle slang | IF. Data practical evaluation TranslateGemma 4b Content: Clever Hakimi won't fall for your mysterious circle slang | IF. Data practical evaluation TranslateGemma 4b | 通过 (无共识错误) |
| google/gemma-3-1b-it | Smart Hakiami won't fall for mysterious you circle black talk | IF. Data Real-Time Review TranslateGemma 4b |
严重
[准确性]
"Smart Hakiami"
理由: Hypothesis transliterates as 'Hakiami' but reference shows 'Hakimi (cute cat)' - missing the cultural context and annotation. | Mistranslation of the name/nickname “哈基米”, which should be “Hakimi”; also misses the nuance of ‘cute cat’ from the reference. | Incorrect translation: '哈基米' is a term for cute cats (from Japanese), not a name 'Hakiami'. '你圈' means 'your circle' not 'you circle'. '黑话' means 'slang/jargon' not 'black talk'.
轻微
[准确性]
"Data Real-Time Review"
理由: Mistranslation of '实战' (practical/combat) as 'Real-Time'; 'Practical Evaluation' is the correct context. | Hypothesis uses 'Real-Time Review' but reference uses 'Practical Evaluation' which better matches '数据实战评测'. | Source/ref conveys “数据实战评测” as “Data Practical Evaluation”; “Real-Time Review” changes nuance slightly from practical/hands-on evaluation. |
| google/gemma-3-4b-it | Smart Hakim will not fall in love with you in the mysterious circle black talk | IF. Data practical assessment Translate Gemma 4b |
严重
[准确性]
"you in the mysterious circle black talk"
理由: Incorrect translation: '哈基米' is a term for cute cats (猫咪), not 'Hakim'. '你圈' means 'your circle', not 'you in the mysterious circle'. '黑话' means 'slang/jargon', not 'black talk' | Misinterprets "你圈黑话" which refers to insider slang of your circle, not 'you in the mysterious circle' or 'black talk' literally. | '黑话' means 'slang', 'jargon', or 'insider talk'. 'Black talk' is a literal and incorrect translation that changes the meaning. | Multiple errors: 'Hakim' should be 'Hakimi'; 'you in the mysterious circle black talk' is grammatically incorrect and doesn't match reference 'your circle's mysterious slang'; missing context that this refers to a cute cat. |
| google/translategemma-12b-it | Title: Smart Hakimi won't fall for your mysterious jargon | IF. Data practical evaluation TranslateGemma 4b Content: Smart Hakimi won't fall for your mysterious jargon | IF. Data practical evaluation TranslateGemma 4b |
严重
[准确性]
"won't fall for your mysterious jargon"
理由: '你圈黑话' means 'slang specific to your circle/community'. 'Your mysterious jargon' is slightly off; 'your circle's mysterious slang' is more accurate. | Incorrect translation: '哈基米' is a Chinese internet slang for 'cute cat' (猫咪), not 'Hakimi' (a name). Reference correctly translates it as 'Hakimi (cute cat)' | Should be 'won't fall in love with your circle's mysterious slang' per reference. 'fall for' and 'jargon' are incorrect translations. |
| google/translategemma-4b-it | Smart Hakim will not fall in love with you in the mysterious circle black jargon | IF. Data practical assessment Translate Gemma 4b |
严重
[准确性]
"you in the mysterious circle black jargon"
理由: Misrenders "神秘你圈黑话"; should be "your circle's mysterious slang" or similar, not "you in the mysterious circle". Also "black jargon" is unnatural for "黑话" here. | Severe mistranslation of the sentence structure and slang '你圈黑话'. The output is nonsensical. | Incorrect translation - '神秘你圈' should be 'your circle's mysterious' (referring to a specific community/slang), not 'you in the mysterious circle' | Mistranslation of '哈基米' and overall phrase structure. Should be 'Smart Hakimi (cute cat) won't fall in love with your circle's mysterious slang'. The hypothesis misses the cultural reference and has awkward phrasing. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: The hypothesis content 'Login' is completely unrelated to the source text, resulting in a total failure to translate. | Hypothesis is completely wrong - it only says 'Login' while the source and reference are about 'Smart Hakimi won't fall in love with your circle's mysterious slang | IF. Data Practical Evaluation of TranslateGemma 4b'. This is a major mistranslation or placeholder. | Completely unrelated to the source and reference; full mistranslation/omission of the original content. | The hypothesis does not match the source or reference content. | Hypothesis 'Login' is completely unrelated to the source content about TranslateGemma 4b evaluation. This is not a valid translation. |
| tencent/HY-MT1.5-7B | "en": "Title: Smart Hakimi Won't Fall for Mysterious Jargon in Your Circle | IF. Data-Driven Evaluation TranslateGemma 4b", "zh-TW": "标题:聰明的哈基米不会愛上你們圈子的神秘黑話 | IF. 數據實戰評測 TranslateGemma 4b", "ja": "タイトル:賢いハキミはあなたたちのサークルの神秘的なジャーゴンに夢中にならない | IF. データ駆動型評価 TranslateGemma 4b", "ko": "제목: 똑똑한 하키미는 여러분의 서클의 신비로운 속어에 반하지 않을 것입니다 | IF. 데이터 기반 평가 TranslateGemma 4b", "de":="Titel: Der kluge Hakimi wird sich nicht in die mysteriösen Jargons eures Kreises verlieben | IF. Datengetriebene Bewertung TranslateGemma 4b", "fr":="Titre : Le sage Hakimi ne tombera pas amoureux des jargons mystérieux de votre cercle | IF. Évaluation basée sur les données TranslateGemma 4b", "it":="Titolo: Il saggio Hakimi non si innamorerà dei gergi misteriosi del vostro circolo | IF. Valutazione basata sui dati TranslateGemma 4b"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: [Open Source Self-Promotion] Don't Let Centipede Crabs Turn Your Computer Into a Garbage Field! |
致命
[准确性]
"Centipede Crabs"
理由: '小龙虾' is crayfish/lobster and here a pun on 'OpenClaw'; 'centipede crabs' is a mistranslation. | Missing translation of the Content section. Also, 'Centipede Crabs' is a literal mistranslation of '小龙虾' which refers to the tool 'OpenClaw' in this context. | Completely wrong translation; '小龙虾' (small shrimp/OpenClaw) mistranslated as 'Centipede Crabs'
严重
[术语]
"Centipede Crabs"
理由: Misnomer for 'OpenClaw'. | Mistranslation of '小龙虾' as 'Centipede Crabs' - reference correctly uses 'OpenClaw'; '垃圾场' as 'Garbage Field' vs reference 'dumpster' | Failed to identify '小龙虾' as the specific tool name 'OpenClaw', translating it literally instead. |
| CohereLabs/tiny-aya-water | Title: [Open Source Self-Promotion] Don't Let Shrimp Turn Your Computer into a Garbage Field! |
严重
[准确性]
"Shrimp"
理由: "小龙虾" here refers to the specific tool "OpenClaw", not generic shrimp; translation loses the proper name and technical reference. | Literal translation of '小龙虾' (Crayfish/Lobster), which is a nickname for the software 'OpenClaw'. Should be translated as the proper noun or kept as OpenClaw based on context. | Incorrect translation of '小龙虾': should be 'OpenClaw' not 'Shrimp'. 'Garbage Field' should be 'dumpster'. This is a critical mistranslation of the main subject. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Self-Recommendation] Don't let 'xiaolongxiao' turn your computer into a landfill! Content: Since various generous people have donated, numerous tools have emerged... While enjoying the convenience of AI programming and using open-source libraries, we unwittingly find our computers riddled with issues! | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Open Source Recommendation] Don't Let Xiaolongbao Turn Your Computer Into a Junkyard! Content: Since benevolent people have generously donated, various tools have emerged one after another... While enjoying the convenience of AI programming and using open-source library tools, our computers have long been riddled with problems! |
严重
[准确性]
"Xiaolongbao"
理由: Literal translation of '小龙虾' (crayfish) as the food 'Xiaolongbao' (soup dumpling) or just food item, missing the context that it refers to the tool 'OpenClaw'. | 严重误译。“小龙虾”在该语境指 OpenClaw/相关 AI 工具,不是“xiao long bao”(小笼包/汤包)。将主语从软件比喻成食品会完全误导读者关于主题。 | Should be 'Don't let OpenClaw turn your computer into a dumpster'. '小龙虾' (xiaolongxia) is a slang term for OpenClaw, not xiaolongbao (soup dumplings). | Mistranslation: '小龙虾' is the Chinese nickname for 'OpenClaw', not the food 'Xiaolongbao' (steamed buns). Reference correctly uses 'OpenClaw'. |
| Qwen/Qwen3-14B | Title: [Self-promotion] Don't let your computer become a junkyard because of LinuxDo! |
轻微
[准确性]
"because of LinuxDo"
理由: The term 'LinuxDo' is not mentioned in the source or reference. | ‘小龙虾’ here refers to ‘OpenClaw’, not LinuxDo; wrong entity and meaning. | The source says '小龙虾' (Little Crayfish/OpenClaw), a specific tool. The hypothesis incorrectly translates this as 'LinuxDo', which is the name of the forum community, causing a complete meaning error. | Completely incorrect translation. '小龙虾' (OpenClaw) is mistranslated as 'LinuxDo'. Should be 'OpenClaw' or 'Open Claw' per reference |
| Qwen/Qwen3.5-4B | Title: [Self-Recommended] Don't let crayfish turn your computer into a landfill! Content: Since generous benefactors have donated funds, various tools have emerged one after another... While enjoying the convenience of AI programming and using open-source library tools, without realizing it, our computers have already become riddled with holes! |
严重
[准确性]
"crayfish"
理由: Failed to identify '小龙虾' as a specific nickname for the tool 'OpenClaw', translating it literally as the animal. | 小龙虾 here is a playful name for OpenClaw, not literal crayfish; loses key product reference | Should be 'OpenClaw' not 'crayfish'; this is a proper noun/product name
轻微
[准确性]
"riddled with holes"
理由: The source '千疮百孔' metaphorically means messy/cluttered with issues, not literally full of holes. | 千疮百孔 here is figurative for messy/cluttered system; "full of holes" implies security damage rather than clutter and junk files | Incorrect translation - 'riddled with holes' vs reference's 'riddled with clutter' - changes meaning significantly |
| Qwen/Qwen3.5-9B | Title: [Open Source Recommendation] Don't let Xiaolongxia turn your computer into a junkyard! Content: Since generous benefactors have poured in funds, various tools have emerged one after another... While enjoying the convenience of AI programming and open-source library tools, we've unknowingly left our computers riddled with holes! |
严重
[术语]
"Xiaolongxia"
理由: Should be 'OpenClaw' - this is a mistranslation of the Chinese term | 源文“小龙虾”为圈内对“OpenClaw”的俗称,按参考应统一为 OpenClaw,直译为小龙虾导致产品指代不清。 | Transliterates '小龙虾' which refers to the specific tool 'OpenClaw' in this community context; should be translated as the tool name. | Incorrect translation: '小龙虾' is the Chinese name for 'OpenClaw', not a direct transliteration; reference correctly uses 'OpenClaw'
轻微
[准确性]
"riddled with holes"
理由: Literal translation of '千疮百孔'; 'riddled with clutter' or 'mess' fits the context of a messy computer better. | 原文“千疮百孔”在此比喻系统混乱/凌乱,参考为“riddled with clutter”,"holes"更偏向漏洞或损坏,语义略偏。 | Changed 'riddled with clutter' to 'riddled with holes' - reference uses 'clutter' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Open Source Recommendations: Don't Let Shrimping Your Computer Turn It Into a Garbage Heap! |
严重
[准确性]
"Shrimping Your Computer"
理由: Severe mistranslation of '小龙虾' (Little Lobster/Crayfish), which is a nickname for the tool 'OpenClaw'. 'Shrimping' changes the meaning entirely and loses the specific tool reference. | The translation changes the term 'OpenClaw' to 'Shrimping' and alters the metaphor. | Incorrect translation of '小龙虾' - should be 'OpenClaw' (the software name), not 'Shrimping' | 将“小龙虾”相关梗误解为“Shrimping Your Computer”,完全改变原本特指OpenClaw的含义,读者无法联想到具体工具或梗 | 'Shrimping Your Computer' is nonsensical; should be 'OpenClaw' or 'Don't let OpenClaw turn your computer into a dumpster' |
| google/gemma-3-12b-it | Title: [Open Source Self-Introduction] Don't Let Crawfish Turn Your Computer Into a Dump! |
严重
[术语]
"Crawfish"
理由: The term 'Crawfish' is used instead of 'OpenClaw', which is a critical error as it misrepresents the subject. | 小龙虾 here is wordplay on 'OpenClaw', a specific project; translating literally as 'Crawfish' loses the reference to the tool mentioned in similar items. | 'Crawfish' is incorrect - reference uses 'OpenClaw' which is the actual product name. '小龙虾' is a nickname/translation of OpenClaw, not literal crawfish. | Should be 'OpenClaw' not 'Crawfish'; this is a specific tool name |
| google/gemma-3-1b-it | Title: 【Open Source Self-Recommend】Don't let small shrimp turn your computer into a landfill! |
严重
[术语]
"small shrimp"
理由: Failed to translate the slang nickname '小龙虾' (OpenClaw) literally, losing the specific reference to the tool. | '小龙虾' is a nickname for 'OpenClaw' (a tool), not literal 'small shrimp'. Should be 'OpenClaw' not 'small shrimp'. | Incorrect translation of '小龙虾' as 'small shrimp' instead of 'OpenClaw'. |
| google/gemma-3-4b-it | Title: [Open Source Recommendation] Don’t let crayfish turn your computer into a junkyard! |
严重
[准确性]
"crayfish"
理由: Misinterprets "小龙虾" here, which contextually refers to the OpenClaw project/brand, not literal crayfish; loses the intended product reference. | Incorrect translation of '小龙虾'. Reference shows this should be 'OpenClaw' - a proper name/tool, not a literal translation of the Chinese term. | '小龙虾' (Little Lobster/Crayfish) is a nickname for the software 'OpenClaw'. Translating it literally as 'crayfish' loses the reference to the specific tool. |
| google/translategemma-12b-it | Title: [Open Source Recommendation] Don't let crawfish turn your computer into a junkyard! |
严重
[准确性]
"crawfish"
理由: Critical mistranslation: '小龙虾' here is a nickname/abbreviation for 'OpenClaw' (a software tool), not the literal 'crawfish' or 'crayfish'. Reference correctly identifies it as 'OpenClaw' | Should be 'OpenClaw' per reference. '小龙虾' is a slang term for OpenClaw, not literal crawfish. | Should be 'OpenClaw' as per reference. | '小龙虾' here refers to 'OpenClaw' (a specific tool/project nickname), not the animal. Translating as 'crawfish' is a critical entity error. |
| google/translategemma-4b-it | "en": "Title: [Open Source Recommendation] Don't let crayfish turn your computer into a trash can!", "zh-TW": "標題:[開源推薦] 別讓龍蝦把你的電腦變成垃圾場!", "ja": "タイトル:[オープンソース推奨] クライワームがあなたのコンピューターをゴミ箱に変えないで!", "ko": "제목: [오픈 소스 추천] 크레이프가 당신의 컴퓨터를 쓰레기통으로 만들지 마세요!", "de": "Titel: \[Open-Source-Empfehlung] Lassen Sie keine Krebs nicht, dass Ihr Computer zu einem Müllcontainer wird!", "fr": "Titre : \[Recommandation Open Source] Ne laissez pas les crevettes transformer votre ordinateur en poubelle !", "it": "Titolo: \[Raccomandazione Open Source] Non lasci che i granchi trasformino il tuo computer in un bidone di rifiuti !" } <|im_end|> |
严重
[准确性]
"crayfish / 龍蝦 / クライワーム / 크레이프 / Krebs / crevettes / granchi"
理由: The metaphorical "小龙虾" refers to the product "OpenClaw" in this community context, not literal seafood; translating as various crustaceans misidentifies the referent. | Failed to identify '小龙虾' as a specific slang term (likely for OpenClaw); literal translation 'crayfish' is incorrect in context. | Incorrect translation of '小龙虾'. Should be 'OpenClaw' (a product name/slang term) not 'crayfish'. The hypothesis provides multiple language translations instead of focusing on the English translation. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '小龙虾' with LinuxDo community context.", "zh-TW": "自從各種善心人士慷慨解囊之後,各種工具層出不窮……在利用開源庫進行人工智能編程的過程中,我們的電腦不知不覺中已经變得破敗不堪!", "ja": "LinuxDoコミュニティの文脈での‘小龍蝦’の翻訳。”, "ko": "각종 선한 사람들이 기부를 하면서 다양한 도구가 등장했습니다… AI 프로그래밍에서 오픈소스 라이브러리를 사용하는 편리함과 함께, 우리는 모르는 사이에 컴퓨터가 이미 망가져 버렸습니다!"}, "de": "Über die Community von LinuxDo wird gesagt, dass ‚Krebsflügel‘ den Computer zu einem Müllhaufen machen kann!", "fr": "Dans le contexte de la communauté LinuxDo, on dit que ‘Krabbe’ peut transformer l’ordinateur en un tas de déchets!", "it": "Nella comunità di LinuxDo si dice che ‘Crustaceo’ può trasformare il computer in una discarica."} |
严重
[准确性]
""en": "Translation of '小龙虾' with LinuxDo community context.""
理由: Hypothesis is a placeholder text with JSON format rather than an actual translation. The reference shows the correct translation should be 'OpenClaw'. | Hypothesis provides multilingual metadata and context notes rather than a direct English translation of the title. The format is incorrect and does not match the expected translation output. | The hypothesis is a malformed JSON object containing meta-commentary and translations into unrelated languages instead of the required English translation of the source. | Does not actually translate the title; just references translation with context. | The hypothesis does not match the reference content. |
| tencent/HY-MT1.5-7B | "en": "Title: [Open Source Recommendation] Don’t Let Lobsters Turn Your Computer into a Dumpyard!", "zh-TW": "Title: [開源推薦] 別讓小龍蝦把你的電腦變成垃圾場!", "ja": "Title: [オープンソース推薦] 小龍蝦によってあなたのコンピュータがゴミ捨て場にならないようにしてください!", "ko": "Title: [오픈소스 추천] 소형 갑각류로 인해 컴퓨터가 쓰레기장이 되지 않도록 하세요!", "de":="Title: [Offenen Source Empfehlung] Lassen Sie nicht zu, dass Garnelen Ihr Computer in eine Mülldeponie verwandeln!", "fr":="Title: [Recommandation Open Source] Ne laissez pas les crevettes transformer votre ordinateur en décharge!", "it":="Title: [Consiglio Open Source] Non lasciare che le gamberetti trasformino il tuo computer in una discarica!"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: NPC Free API Invitation Code Daily Update |
轻微
[术语]
"Free API"
理由: Incomplete translation - missing 'proxy site' in hypothesis vs reference 'NPC free API proxy site' | Context suggests 'free API proxy site' is the intended meaning for '公益站'. | '公益站' is a community term meaning 'free API proxy site'; 'Free API' alone omits 'site/proxy' and is ambiguous.
严重
[准确性]
"NPC Free API Invitation Code Daily Update"
理由: Missing translation of the Content section entirely. | Should be 'Daily sharing of invitation codes for NPC free API proxy site' to match reference | Missing 'sharing' and 'proxy site'. |
| CohereLabs/tiny-aya-water | Title: NPC Free API Invitation Code Daily Update |
轻微
[术语]
"NPC Free API"
理由: Should be 'Daily sharing of invitation codes' not 'Daily Update'. Missing 'for NPC free API proxy site' specification. | While 'NPC' is kept, the context of 'Free API proxy site' (公益站) is partially lost or abbreviated compared to reference. | Should be 'Free API proxy site' to match source terminology '公益站' | The term "公益站" is more accurately rendered as "free API proxy site"; "Free API" alone is imprecise.
轻微
[准确性]
"Daily Update"
理由: Title omits that these are invitation codes being shared/updated daily; should explicitly mention invitation codes. | Omission of the 'sharing' aspect and the specific content about generating codes and the request to be reminded ('poke me'). | The reference uses 'sharing' instead of 'Update'. | Incomplete translation - missing '公益站' meaning 'free API proxy site' in the title |
| Qwen/Qwen2.5-14B-Instruct | Title: Community Free Shared API Site Invitation Code Daily Share Content: Thanks to the invitation code from a friend, I can now generate my own invitation codes. From now on, I will update the invitation codes daily; if you forget yours, feel free to kick me. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: NPC Free API Invitation Code Daily Share Content: Thanks to friends' invitation codes, we can finally generate them now. Invitations will be updated daily; if you forget, just kick me. |
轻微
[术语]
"NPC Free API Invitation Code"
理由: 来源“NPC公益站”指“NPC free API proxy site”;遗漏“proxy site”改变固定站点含义,仅写“Free API”不够具体。 | Terminology: 'Free API' vs reference 'free API proxy site'. The community term '公益站' specifically means free API proxy site. | Missing 'proxy site' which clarifies the nature of the service. |
| Qwen/Qwen3-14B | Codice di invito del sito proxy NPC da condividere ogni giorno |
严重
[准确性]
"Codice di invito del sito proxy NPC da condividere ogni giorno"
理由: The hypothesis only translates the title and omits the entire content paragraph. | Captures the title idea but omits body information about being able to generate invitation codes thanks to a friend, daily updates, and a reminder request; content not translated. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | The term 'Codice di invito' is less precise than 'invitation codes' in the reference. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Daily sharing of free shared API site invitation codes Content: Thanks to my friends' invitation codes, I can finally generate one now. I will update daily. If you forget, feel free to ask me. |
轻微
[术语]
"free shared API site"
理由: Less precise than "free API proxy site" for 公益站 | Should be 'free API proxy site' to match the specific domain terminology for '公益站'. | Inconsistent translation of '公益站' - hypothesis uses 'free shared API site' while reference uses 'free API proxy site' |
| Qwen/Qwen3.5-9B | Title: Free Shared API Site Invitation Code Daily Share Content: Thanks to the invitation code from our friends, we can finally generate invitation codes now. We will update them daily; if you forget, feel free to ping me. |
轻微
[术语]
"Free Shared API Site"
理由: “公益站”在该语境固定为“free API proxy site”,"shared API site"术语不统一。 | Should be 'NPC free API proxy site' to match reference | Changed 'NPC free API proxy site' to 'Free Shared API Site' - terminology inconsistency | Should be 'free API proxy site' to match community terminology for '公益站'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Free Shared API Site Daily Invites Content: Thanks to friends' invites, I can now generate invites daily. From now on, updates will be made every day. If you forget, please remind me. |
轻微
[术语]
"Free Shared API Site"
理由: '公益站' is better translated as 'free API proxy site' based on context consistency. | 源语“NCP公益站”在参考中固定为“free API proxy site”,译为“Shared API Site”未体现proxy特性,术语不统一 | Inconsistent terminology - 'invites' vs reference 'invitation codes' |
| google/gemma-3-12b-it | Title: Free API Site Invitation Code Daily Sharing Content: Thanks to the invitation code from a friend, I can finally generate invitation codes. I will update the invitation codes daily going forward. If you forget, feel free to message me. |
轻微
[术语]
"Free API Site"
理由: Reference uses 'NPC free API proxy site' - 'NPC' is a specific site name that should be preserved. Also 'friend' should be 'bro' to match the slang style. | Should be 'free API proxy site' to match the specific community terminology for '公益站'. | 公益站 in this domain is usually 'free API proxy site'; 'site' alone misses the 'proxy/API' nuance seen in references. | Should be 'NPC free API proxy site' to maintain specificity |
| google/gemma-3-1b-it | Thanks to the friend's invitation code, now you can generate an invitation code. In the future, the invitation code will be updated daily, and if you forget, you can kick me. |
轻微
[风格]
"Thanks to the friend's invitation code"
理由: Translation is acceptable but 'friend' is too formal; 'bro' is more appropriate for the context. 'NPC公益站' should be translated as 'NPC free API proxy site'. | Missed the community slang nuance of '佬友' (bro/expert friend), translating it generically as 'friend'. | “佬友” is more like ‘bro’/‘buddy’, not specifically “friend”; slightly off in tone but meaning preserved. | Hypothesis uses 'the friend' but reference uses 'a bro' which better matches the casual tone of '佬友'. |
| google/gemma-3-4b-it | NPC Free Shared API Site Invitation Code Daily Sharing |
轻微
[术语]
"Free Shared API Site"
理由: "公益站" in this context is more specifically a free API proxy site; "shared API site" is less precise than "API proxy" used in reference. | '公益站' in this context is consistently translated as 'free API proxy site' in references. 'Shared API site' is slightly off. | Reference uses 'NPC free API proxy site' which is more accurate terminology than 'Free Shared API Site'. |
| google/translategemma-12b-it | Title: Daily Sharing of NPC Public Proxy Invitation Codes Content: Thanks to the invitation codes from our friends, we can now generate invitation codes. We will update the invitation codes daily. If you forget, feel free to message me. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Daily Sharing of NPC Free Shared API Site Invitation Code |
轻微
[术语]
"Free Shared API Site"
理由: Should be 'free API proxy site' not 'Free Shared API Site'. The reference uses more specific terminology. | 'Free Shared API Site' should be 'Free API proxy site' to match the reference terminology | Inconsistent terminology for '公益站'; 'free API proxy site' is preferred based on other items. |
| tencent/HY-MT1.5-1.8B | Title: NPC公益站邀请码每日分享 Content: Thanks to the invite code from Lao You, we can now generate an invite code. We will update the invite code every day in the future. If you forget, you can kick me. | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | NPC Public Welfare Site Invitation Code Shared Daily | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Cc switch and codex issue? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Content: cc switch and codex issue? Now is codex logged in, but feels like still on cc switch, asked AI to find out no solution found, has anyone encountered this before? Must be a cc switch problem... | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Cc Switch and Codex Issue? Content: Is there still an issue with the Cc Switch even though Codex has logged in? I couldn't find a solution after asking AI; has anyone encountered this before? |
严重
[准确性]
"Is there still an issue with the Cc Switch even though Codex has logged in?"
理由: Missing key information: 'it feels like it's still routing through CC-Switch' is important context | Slightly restructures the question; original explicitly says Codex is logged in but traffic still goes through CC-Switch. Meaning mostly preserved but less explicit about routing behavior. | The translation misses the nuance '感觉还是在走 cc switch' (feels like it's still routing through CC-Switch). It changes the meaning from a routing issue to a general login issue. |
| Qwen/Qwen2.5-7B-Instruct | Title: Cc switch and codex issues? Content: cc switch and codex issues? Now it's codex login, but it feels like we're still using cc switch. Asked AI but couldn't find a solution. Has anyone else encountered this issue? It must be a cc switch problem... | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Problemi con il Cc switch e il codex? |
严重
[准确性]
"Problemi con il Cc switch e il codex?"
理由: The hypothesis only translates the title and omits the detailed content describing the technical issue. | The term 'Cc switch' should be 'CC-Switch' as in the reference. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | Title is translated, but the detailed body about being logged into Codex yet still routing through CC-Switch and asking for solutions is omitted. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Issue with Cc switch and Codex? Content: There's an issue with the Cc switch and Codex. It seems like I'm logged in to Codex but still navigating through the Cc switch. I asked AI for help but couldn't find a solution. Has anyone else encountered this? It must be a problem with the Cc switch... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Issue with Cc switch and codex? Content: The cc switch is logged in as codex, but it feels like it's still going through the cc switch. I asked AI and couldn't find a solution. Has anyone else encountered this? It must be an issue with the cc switch... |
轻微
[风格]
"Issue with Cc switch and codex?"
理由: Should be 'CC-Switch' with hyphen | Inconsistent capitalization: 'Cc switch' vs reference 'CC-Switch' | Capitalization inconsistency; should be 'CC-Switch'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Cc switch and codex issue? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Cc switch and Codex issues? Content: Cc switch and Codex issues? Now it’s logged in as Codex, but it still feels like it’s going through the Cc switch. I asked AI but couldn’t find a solution. Has anyone encountered this before? It must be an issue with the Cc switch... |
轻微
[术语]
"Cc switch"
理由: The term 'Cc switch' is used instead of 'CC-Switch', which is inconsistent with the reference. | Should be 'CC-Switch' with hyphen for consistency with reference | Should be 'CC-Switch' with hyphen as in reference. Also 'codex' should be 'Codex' with capital C. |
| google/gemma-3-1b-it | "en": "Are there any issues with CC switch and Codex?", "zh-TW": "CC switch 和 Codex 有沒有問題?", "ja": "CC switch と Codex に問題はありますか?", "ko": "CC switch 와 코덱에 문제가 있을까요?”, "de": "Gibt es Probleme mit CC Switch und Codex?", "fr": "Y a-t-il des problèmes avec CC Switch et Codex ?", "it": "Ci sono problemi con CC Switch e Codex?" } |
严重
[准确性]
""en": "Are there any issues with CC switch and Codex?",
"zh-TW": "CC switch 和 Codex 有沒有問題?",
"ja": "CC switch と Codex に問題はありますか?",
"ko": "CC switch 와 코덱에 문제가 있을까요?”,
"de": "Gibt es Probleme mit CC Switch und Codex?",
"fr": "Y a-t-il des problèmes avec CC Switch et Codex ?",
"it": "Ci sono problemi con CC Switch e Codex?"
}"
理由: The hypothesis includes translations in multiple languages, which does not match the reference. | Hypothesis provides multiple language translations but target language is zh (Chinese). Should provide Chinese translation only. | Output is a multilingual dictionary/object instead of a single translation; it also truncates the original content to just the title question and omits the descriptive sentence about being logged in to Codex but still using CC-Switch. |
| google/gemma-3-4b-it | Title: Cc switch and Codex issues? Content: Cc switch and Codex issues? Currently, Codex is logging in, but it still feels like it’s using the Cc switch. I asked AI for help but couldn’t find a solution. It must be a Cc switch problem... |
轻微
[术语]
"Cc switch"
理由: Inconsistent capitalization: should be 'CC-Switch' to match reference | Should be 'CC-Switch' (with hyphen) for consistency with reference and proper formatting. | Product name appears as "CC-Switch" in reference; capitalization and hyphen should match for proper terminology. |
| google/translategemma-12b-it | Title: Issues with Cc Switch and Codex? Content: Problems with cc switch and codex? I'm logged in with codex, but it feels like it's still using the cc switch. I asked AI for help, but couldn't find a solution. Has anyone else encountered this? It's definitely a problem with the cc switch... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Issues with Cc switch and Codex? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "The title reads as follows: 'Cc switch 和 codex 问题?'", "zh-TW": "標題如下:‘Cc切換和Codex問題?’", "ja": "タイトルは次のようになっています:‘CcスイッチとCodexの問題?’", "ko": "제목은 다음과 같습니다: ‘Cc 스위치와 Codex 문제?’", "de": "Die Überschrift lautet wie folgt: 'Cc-Schalter und Codex-Frage?'", "fr": "Le titre est le suivant : 'Question sur le commutateur Cc et le Codex ?'", "it": "Il titolo è il seguente: 'Domanda sul commutatore Cc e sul Codex?'} |
轻微
[准确性]
""en": "The title reads as follows: 'Cc switch 和 codex 问题?'""
理由: The English part is a meta-description and does not translate ‘问题’ as ‘issues/questions/problems’ in natural English; it just quotes the Chinese. | Hypothesis is a placeholder text with JSON format rather than an actual translation. The reference shows the correct translation should be 'CC-Switch and Codex issues?'. | Hypothesis provides multilingual metadata and context notes rather than a direct English translation. The format is incorrect and does not provide the actual translation. | The hypothesis is a malformed JSON object containing meta-commentary and translations into unrelated languages instead of the required English translation. |
| tencent/HY-MT1.5-7B | "en": "Title: Issues with the Cc Switch and Codex?", "zh-TW": "标题:Cc Switch和Codex的问题?", "ja": "タイトル:Cc SwitchとCodexの問題?」, "ko": "제목: Cc Switch와 Codex의 문제?", "de":="Titel: Probleme mit dem Cc Switch und Codex?", "fr": "Titre : Problèmes avec le Cc Switch et Codex ?", "it": "Titolo: Problemi con il Cc Switch e Codex?" }|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: Since openai great benevolent man codex's free API sites keep popping up one after another, I spent an afternoon posting messages and registered 7-8 sites. Is there any way to manage them? I feel like I've wasted time on some of them (fidgeting with my hands). Should we add another layer of proxy? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Content: Since openai great man codex's free API site keeps popping up one after another, I spent an afternoon posting and registered 7-8. Is there any way to manage this? It feels like I have wasted my time (fidgeting). Are we going to add another layer of proxy? |
严重
[流畅性]
"openai great man codex's free API site"
理由: Unnatural and confusing phrase order; "great man" is odd for "大善人", and structure obscures meaning. | The reference uses 'generous', which is more appropriate. | Translation is too literal and ungrammatical - 'great man' should be 'generous benefactor' and overall sentence structure is awkward | Multiple fluency issues: 'openai great man' should be 'Thanks to the generous OpenAI'. 'wasted my time' should be 'registered for some in vain'. 'fidgeting' should be 'just itchy hands'. 'add another layer of proxy' should be 'set up another layer of reverse proxy'.
严重
[准确性]
"posting"
理由: Mistranslation of '刷了一下午帖子' (browsing/scrolling through posts for an afternoon) as 'posting' (creating posts). | User browsed posts rather than posting; nuance is reversed. | Incorrect - source says '刷了一下午帖子' meaning 'browsing/scrolling posts' not 'posting' |
| Qwen/Qwen2.5-14B-Instruct | Title: There are too many registered free shared sites, how to manage Content: Due to the generosity of openai, codex's free shared sites keep popping up. I spent half a day registering about 7-8 of them. Is there any way to manage them? It feels like they're just being registered without much use (itchy fingers). Should I add another layer of redirection? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | There are too many registered public proxies, how to manage them? Due to OpenAI's generous actions, one public proxy after another has emerged from Codex. I spent an afternoon posting and registered 7-8 of them. Is there any management method? It feels like it was all for nothing (just out of curiosity). Do we need to add another layer of forwarding? |
轻微
[准确性]
"posting"
理由: Misinterpreted '刷...帖子' (browsing/scrolling posts) as 'posting' (creating posts). | “刷了一下午帖子”意为刷帖/看帖而非发帖;“spent an afternoon browsing posts”更准确。 | 'posting' should be 'browsing posts' as in reference.
轻微
[术语]
"another layer of forwarding"
理由: “再套一层中转”更接近“another layer of reverse proxy”;“forwarding”不符合常用网络术语。 | 'Reverse proxy' is the precise technical term for '中转'. | 'forwarding' should be 'reverse proxy' as in reference, which is the correct technical term. |
| Qwen/Qwen3-14B | Troppi siti API gratuiti da registrare, come gestirli |
严重
[准确性]
"Troppi siti API gratuiti da registrare, come gestirli"
理由: The hypothesis only translates the title and omits the entire content paragraph. | Title is fine, but the body describing OpenAI generosity, registering 7–8 sites, feeling some registrations are wasted, and considering adding another proxy layer is omitted. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | The term 'gestirli' is less precise than 'manage them' in the reference. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Too many registered free shared API sites; how to manage them? Content: Thanks to OpenAI's generous contributions, new public proxies keep appearing one after another. I spent the afternoon posting and registered about 7-8 accounts. Is there any management method? It feels like registering is wasted effort (just itching to do more). Should we add another layer of intermediaries? |
轻微
[术语]
"free shared API sites"
理由: Inconsistent translation of '公益站' - hypothesis uses 'free shared API sites' while reference uses 'free API proxy sites' | Less standard than "free API proxy sites" for 公益站 in this community | Should be 'free API proxy sites'.
轻微
[准确性]
"posting"
理由: Misinterpreted '刷了一下午帖子' (browsing/scrolling through posts for an afternoon) as 'posting' (creating content). | 刷帖子 means browsing/reading posts rather than posting them | Incorrect translation - 'posting' vs reference's 'browsing posts' - changes meaning
轻微
[术语]
"intermediaries"
理由: '中转' in this context specifically refers to a 'reverse proxy' or 'relay', 'intermediaries' is too vague. | Word choice 'intermediaries' differs from reference's 'reverse proxy' - technical term missing | "Reverse proxy" is more accurate than generic "intermediaries" for 再套一层中转 in this technical context |
| Qwen/Qwen3.5-9B | There are too many registered free shared API sites; how do we manage them? Due to OpenAI's generosity, Codex public proxies keep appearing one after another. I spent an afternoon browsing posts and registered 7-8 accounts. Is there any management method? It feels like registering was pointless (itchy fingers). Should we add another layer of relay? |
轻微
[术语]
"There are too many registered free shared API sites"
理由: Changed 'free API proxy sites' to 'free shared API sites' - terminology inconsistency | Should be 'I've registered for too many free API proxy sites' to match reference | Should be 'free API proxy sites'.
轻微
[术语]
"Codex public proxies"
理由: Changed 'Codex free API proxy sites' to 'Codex public proxies' - terminology difference | Inconsistent terminology; should remain 'free API proxy sites'. | Should be 'Codex free API proxy sites' to match reference |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The number of free shared API sites is too many; how can they be managed? | 通过 (无共识错误) |
| google/gemma-3-12b-it | There are too many free API sites registered, how should they be managed? Due to the generosity of OpenAI and Codex, free API sites keep popping up. I spent an afternoon browsing posts and registered 7 or 8. Are there any management methods? Some feel like they were registered for nothing (just feeling itchy). Should I add another layer of relaying? |
轻微
[术语]
"another layer of relaying"
理由: Reference uses 'reverse proxy' which is the correct technical term. | The source implies 'another layer of reverse proxy'; 'relaying' is understandable but less precise than 'reverse proxy' in this technical context. | '中转' in this context refers to a 'reverse proxy'. 'Relaying' is vague. 'Reverse proxy' (as in reference) is the technical term. |
| google/gemma-3-1b-it | Due to OpenAI's Good Deeds, there are too many public APIs, how to manage? |
严重
[准确性]
"Due to OpenAI's Good Deeds, there are too many public APIs, how to manage?"
理由: Misses key term “公益站” meaning free API proxy sites, not generic ‘public APIs’; also omits information about registering 7–8 sites and management concerns. | The hypothesis only translates the title and omits the entire content paragraph. It also mistranslates '公益站' as 'public APIs' instead of 'free API proxy sites'. | Incorrect translation: '公益站' means 'free API proxy sites' not 'public APIs'. '大善人' is slang for generous providers (like OpenAI/Codex), not literal 'Good Deeds'. Missing context about Codex. | Hypothesis is incomplete and does not capture the full meaning. Should include 'free API proxy sites' and the context about registering for multiple services. |
| google/gemma-3-4b-it | There are too many registered public proxy sites, how do I manage them? |
轻微
[术语]
"public proxy sites"
理由: Inconsistent with other items where '公益站' is translated as 'free API proxy sites'. 'Public' might imply open access without registration, whereas these are likely community-shared. | "公益站" here specifically means free API proxy sites; "public" is vague compared to "free API proxy" used in the reference. | Reference uses 'free API proxy sites' which is more specific and accurate than generic 'public proxy sites'. |
| google/translategemma-12b-it | Title: There are too many registered public API sites, how should they be managed? Content: Due to OpenAI's generosity, codex's public API sites keep popping up one after another. I spent the afternoon browsing posts and registered 7 or 8 of them. Are there any management methods? It feels like some were registered for nothing (just because I wanted to). Should we add another layer of redirection? | 通过 (无共识错误) |
| google/translategemma-4b-it | Too many registered public API sites (公益站). How can I manage them? |
严重
[准确性]
"public API sites (公益站)"
理由: Left the source term '公益站' in parentheses instead of translating the slang meaning properly. | "公益站" here refers specifically to free API proxy sites; "public API sites" is too generic and misses the proxy/relay nuance. | Incomplete translation. Missing content from source. Should include full title translation and the hypothesis only provides partial translation without the full context. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 注册的公益站太多了,怎么管理' in en |
轻微
[准确性]
"Translation of 'Title: 注册的公益站太多了,怎么管理' in en"
理由: No actual translation content; only a meta statement about translation. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'I've registered for too many free API proxy sites, how do I manage them?'. |
| tencent/HY-MT1.5-7B | There are too many registered public welfare sites; how can they be managed? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: The CLaw teaches this piece |
致命
[准确性]
"The CLaw teaches this piece"
理由: '龙虾教这一块' refers to the OpenClaw hype/cult around it, not 'The Claw teaches this piece'; meaning is lost. | Complete mistranslation. '龙虾教' refers to the 'OpenClaw cult/hype/community', not 'CLaw teaches'. '这一块' means 'regarding this aspect', not 'this piece'. | Completely incorrect translation; '龙虾教这一块' should be 'Regarding the OpenClaw cult/hype' or similar | Mistranslation of '龙虾教' - reference shows 'OpenClaw cult/hype', hypothesis incorrectly translates as 'CLaw teaches'; missing context of 'hype' |
| CohereLabs/tiny-aya-water | What do you think about openclaw? I feel that its packaging is not enough, right? |
严重
[准确性]
"What do you think about openclaw? I feel that its packaging is not enough, right?"
理由: Severely incomplete translation. Missing entire context about self-media, capital, and token selling. Only translates first two sentences and omits the main argument about lack of useful applications. | Severe omission. Misses the speculation about traffic farming, capital selling tokens, and the lack of useful applications. | Translation is incomplete - missing the full context about '自媒体想写教程吃流量和资本想卖token' (content creators wanting traffic and capitalists wanting to sell tokens) and '实在看不到什么很有用的应用' (don't see useful applications) | Only translates the first sentence; omits speculation about hype, traffic, token selling, and lack of useful applications. |
| Qwen/Qwen2.5-14B-Instruct | Title: The Lobster Cult Content: Speaking of friends, what do you think about OpenClaw? I feel that this thing is not well encapsulated enough. Is the hype due to self-media wanting to write tutorials for traffic and capital wanting to sell tokens? Honestly, I can't see any particularly useful applications. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Crab teaches this part Content: So, what do Lao friends think about openclaw? I don't think the encapsulation is good enough either. Is this hype driven by both media outlets wanting to write tutorials for traffic and capital wanting to sell tokens? |
轻微
[准确性]
"Crab teaches this part"
理由: Mistranslation: '龙虾' is the Chinese nickname for 'OpenClaw', not 'Crab'. Reference correctly uses 'OpenClaw'. | Should be 'Regarding the OpenClaw cult/hype...' not 'Crab teaches this part'. The title is completely mistranslated. | Complete mistranslation of '龙虾教这一块' (Regarding the OpenClaw cult/hype). 'Crab' is wrong animal, 'teaches' is wrong verb. | 标题“龙虾教这一块”是对 OpenClaw 相关话题/圈层的戏称,并非“螃蟹(crab)”或“教学”;译文完全误解主语和含义。 |
| Qwen/Qwen3-14B | Sezione Educazione Lobster |
致命
[准确性]
"Sezione Educazione Lobster"
理由: The source '龙虾教这一块' is a slang phrase meaning 'Regarding the OpenClaw cult/hype'. The hypothesis mistranslates it as 'Lobster Education Section', which is nonsensical in context. | Completely mistranslates ‘龙虾教这一块’, which refers to the ‘OpenClaw cult/hype area/topic’, not an ‘Education Lobster section’. | The term 'Educazione Lobster' is less precise than 'Regarding the OpenClaw cult/hype...' in the reference. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. Also 'Lobster' is incorrect - '小龙虾' refers to OpenClaw, not a lobster. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. Also '龙虾教' should translate to 'OpenClaw cult/hype' not 'Lobster Education' |
| Qwen/Qwen3.5-4B | Title: Lobster teaches this part Content: So what do the folks think about openclaw? I feel like its encapsulation isn't enough either. Is all this hype driven by self-media wanting to write tutorials for traffic and capital wanting to sell tokens? |
严重
[准确性]
"Lobster teaches this part"
理由: Completely mistranslated '龙虾教这一块' (Regarding the OpenClaw cult/hype/scene). '教' here refers to a 'cult' or 'faction' of users, not 'teaching'. | 龙虾教 is a slang about the OpenClaw hype/cult, not literal "Lobster teaches"; translation loses intended critical/hype nuance | Incorrect translation of '龙虾教' - hypothesis uses 'Lobster teaches' but reference correctly interprets as 'OpenClaw cult/hype' | Should be 'Regarding the OpenClaw cult/hype' - completely different meaning |
| Qwen/Qwen3.5-9B | Title: The Lobster School is good at this Content: Speaking of how old friends view openclaw, I feel its encapsulation isn't enough either. Is the hype driven by self-media wanting to write tutorials for traffic and capital wanting to sell tokens? Honestly, I can't see any very useful applications. |
致命
[准确性]
"The Lobster School is good at this"
理由: Incorrect translation: '龙虾教' refers to 'OpenClaw cult/hype', not 'Lobster School'; reference correctly uses 'OpenClaw cult/hype' | Title is completely wrong; should be 'Regarding the OpenClaw cult/hype...' - 龙虾 refers to OpenClaw, not a school | “龙虾教这一块”指关于 OpenClaw 的教派式讨论/风潮,标题含义是关于OpenClaw的"教"/hype,而非“龙虾学校很好”,完全误解。 | Completely mistranslates the title. '龙虾教这一块' means 'Regarding the OpenClaw cult/hype', not that the school is good at something. | Incorrect term; should be 'OpenClaw' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The title is 'Dragon Shrimp Teach This Part'. The content says, 'Friends, what do you think about openclaw? I feel that its encapsulation isn't very good either. It's so popular now—do you think it's driven by self-media wanting to write tutorials for traffic and capital selling tokens, or is there something really useful coming from it?' |
严重
[准确性]
"The title is 'Dragon Shrimp Teach This Part'."
理由: Incorrect translation of '龙虾教' - should be 'OpenClaw cult/hype' not 'Dragon Shrimp Teach' | “龙虾教这一块”指的是关于OpenClaw这方面的“教派/圈子/风潮”,而非字面“Dragon Shrimp Teach This Part”,标题解释错误 | Inaccurate title translation; should be 'Regarding the OpenClaw cult/hype' or similar. 'Dragon Shrimp Teach This Part' is nonsensical | Hallucinated structure. The model explicitly states 'The title is...' instead of just providing the translation. '龙虾教' refers to the 'OpenClaw cult/fanbase', not 'Dragon Shrimp Teach'. |
| google/gemma-3-12b-it | Title: Lobster Cult This Section Content: So how do the old friends see openclaw? I don't think this thing has enough encapsulation either. Is it being hyped so much because media creators want to write tutorials for traffic and capital wants to sell tokens, a joint push? Really can't see any useful applications. |
严重
[术语]
"Lobster Cult"
理由: The term 'Lobster Cult' is used instead of 'OpenClaw', which is a critical error as it misrepresents the subject. | 龙虾/小龙虾 stands for 'OpenClaw' in this forum slang; literal 'Lobster' loses the specific product name and related nuance. | 'Lobster Cult' is incorrect - reference uses 'OpenClaw' which is the actual product name. '龙虾' is a nickname for OpenClaw.
轻微
[风格]
"old friends"
理由: '佬友们' is slang for 'bros' or 'folks' in the tech community. 'Old friends' sounds unnatural and overly formal/literal. | 佬友们 is closer to 'bros' or 'folks'; 'old friends' adds unintended sentiment and formality. | Reference uses 'bros' which matches the slang style better than 'old friends'. |
| google/gemma-3-1b-it | Translation of '龙虾教这一块' in en |
严重
[准确性]
"Translation of '龙虾教这一块' in en"
理由: Missing translation of the content section. | Meta-statement rather than a translation; fails to convey meaning such as “Regarding the OpenClaw cult/hype in this area.” | The hypothesis is a meta-commentary describing the task rather than providing the actual translation of the source text. | Hypothesis is incomplete and does not provide actual translation. Should translate to something like 'Regarding the OpenClaw cult/hype' as shown in reference. | Hypothesis is not a translation but a placeholder indicating translation task. Missing actual translation content. |
| google/gemma-3-4b-it | "en": "Title: Lobster Tutorial This Section\nContent: Let me ask everyone how they think about openclaw, I feel that its encapsulation is not enough, why it has become so popular, is it because self-media wants to write tutorials and eat traffic, and capital wants to sell tokens? I really can’t see any useful applications.”, "zh-TW": "標題:龍蝦教學這一塊\n內容:說實在話,大家怎麼看openclaw?我覺得這東西封裝度不夠,炒這麼火到底是自媒體想寫教程、博取流量,還是資本想賣代幣推動的呢?實在看不出什麼有用的應用。", "ja": "タイトル:エビ教のこのセクション\nコンテンツ:皆さんはオープンクラウについてどう思いますか?エンクロージャーが足りないと感じています。なぜこんなに人気が出たのか疑問です。メディアがチュートリアルを書き、トラフィックを集めたり、資本がトークンを販売したりする共同の推進力でしょうか。「役に立つアプリケーションが見つかりません」", "ko": "제목: 게 교의 이 부분\n내용: 여러분은 오픈클라우에 대해 어떻게 생각하시나요? 이 제품의 포장도가 부족하다고 생각합니다. 왜 이렇게 뜨거워졌는지 궁금합니다. 미디어가 튜토리얼을 쓰고 트래픽을 모으거나 자본이 토큰을 판매하는 공동 추진력이었을까요? 유용한 애플리케이션이 보이지 않습니다.", "de": "Titel: Lammelgarn-Tutorial Dieser Abschnitt\nInhalt: Was denkt ihr über Openclaw? Ich finde, die Verkapselung ist nicht gut genug. Warum es so populär geworden ist, weiß ich nicht. Ist es das Zusammenspiel von Self-Media, die Tutorials schreiben und Traffic generieren wollen, und Kapital, das Token verkaufen will?", "fr": "Titre : Tutoriel sur les homards Cette section\nContenu : Qu'est-ce que vous pensez de l'openclaw ? Je trouve que son emballage n'est pas suffisant. Pourquoi il est devenu si populaire, je ne sais pas. Est-ce le résultat d'une collaboration entre les médias sociaux qui veulent écrire des tutoriels et générer du trafic, et le capital qui veut vendre des jetons ?", "it": "Titolo: Tutorial sul Gamberetto Questa sezione\nContenuto: Cosa ne pensate dell'openclaw? Trovo che la sua incapsulazione non sia sufficiente. Non capisco perché sia diventato così popolare. È stato spinto da una collaborazione tra i media sociali che vogliono scrivere tutorial e generare traffico e dal capitale che vuole vendere token?" } <|im_end|> | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: The Lobster Clan Content: What do you guys think about openclaw? I don't think this thing has enough encapsulation. Is it being hyped so much because media outlets want to write tutorials and generate traffic, and investors want to sell tokens? I really don't see any useful applications. |
严重
[准确性]
"The Lobster Clan"
理由: '龙虾教' refers to the hype/cult surrounding 'OpenClaw'. 'Lobster Clan' sounds like a group of people rather than the phenomenon/tool hype. Reference 'OpenClaw cult/hype' is better. | Should be 'Regarding the OpenClaw cult/hype...' per reference. Title completely misses the meaning. | Critical mistranslation: '龙虾教' is a pun/slang referring to 'OpenClaw' (the software), not literal lobsters. Reference correctly identifies it as 'OpenClaw cult/hype' | Should be 'OpenClaw' as per reference. |
| google/translategemma-4b-it | Title: How to use openclaw |
严重
[准确性]
"How to use openclaw"
理由: Completely mistranslated - the source is asking for opinions about OpenClaw (the hype/cult), not how to use it. The reference correctly translates as 'Regarding the OpenClaw cult/hype...' | Source "龙虾教这一块" and content discuss opinions on hype around OpenClaw, not a how-to-use question; the hypothesis changes the topic to usage instructions. | Complete hallucination/mistranslation. The source asks for opinions on the 'hype/cult' ('龙虾教'), not a tutorial on how to use it. | Inaccurate title translation. Should be 'Regarding the OpenClaw cult/hype...' not 'How to use openclaw'. The source discusses opinions about OpenClaw, not usage instructions. |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾教这一块' in en |
严重
[准确性]
"Translation of '龙虾教这一块' in en"
理由: Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'Regarding the OpenClaw cult/hype...'. | No translation is provided; only a reference to the source text is present. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | The hypothesis does not match the reference content. |
| tencent/HY-MT1.5-7B | "en": "Title: The LinuxDo Community on Lobster Teaching", "zh-TW": "Title: The LinuxDo Community on Lobster Teaching", "ja": "Title: The LinuxDo Community on Lobster Teaching", "ko": "Title: The LinuxDo Community on Lobster Teaching", "de": "Titel: Die LinuxDo Gemeinschaft zu Lobster Teaching", "fr": "Titre: La communauté LinuxDo sur l’enseignement du lobster”, "it": "Titolo: La comunità LinuxDo sull’insegnamento del lobster"} |
致命
[准确性]
"The LinuxDo Community on Lobster Teaching"
理由: 完全误解“龙虾教”为对OpenClaw的戏称以及“教/邪教式狂热”的含义,将其译成教学话题并引入无中生有的“LinuxDo Community”,严重歪曲主题。 | Complete hallucination/mistranslation. '龙虾教' refers to the 'OpenClaw cult/hype', not 'LinuxDo Community' or 'Teaching'. The hypothesis ignores the actual meaning. | Completely incorrect translation. Should be 'Regarding the OpenClaw cult/hype...' The hypothesis misses the entire meaning and context of the source. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | How can I make Gemini use third-party APIs within cc-switch? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | How can I make Gemini use third-party APIs within cc-switch? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: How can gemini use third-party APIs from cc-switch? Content: I can skip this step when running on Windows PowerShell, but not when running under WSL. Does anyone know why? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: How can I make Gemini use third-party APIs from cc-switch Content: I can skip this step when running in Windows PowerShell, but not in WSL. Does anyone know why? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Hypothesis is incomplete/malformed JSON fragment. Should be a complete English translation of the Chinese title | The hypothesis is a broken string containing XML tags and ellipsis, failing to provide any translation of the source text. | The translation is incomplete and does not match the reference. |
| Qwen/Qwen3.5-4B | How to make Gemini use third-party APIs from cc-switch I can run this under PowerShell on Windows and skip this step, but I cannot skip this step when running under WSL. Does anyone know what's going on? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: How can I make Gemini use third-party APIs from cc-switch? Content: It works on PowerShell under Windows and skips this step, but when running under WSL, I cannot skip this step. Does anyone know what's going on? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | How can I make Gemini use the third-party API from cc-switch? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: How can I make Gemini use third-party APIs from cc-switch? Content: I can skip this step when running under PowerShell on Windows, but I cannot skip it when running under WSL. Does anyone know what’s going on? | 通过 (无共识错误) |
| google/gemma-3-1b-it | How to make Gemini use the cc-switch API in Windows? |
严重
[准确性]
"How to make Gemini use the cc-switch API in Windows?"
理由: Incorrect translation: Source asks about using third-party API in CC-Switch, not specifically in Windows. Hypothesis incorrectly specifies 'in Windows' which changes the meaning. Also 'cc-switch' should be 'CC-Switch' with proper capitalization. | The hypothesis omits the content regarding WSL vs PowerShell and incorrectly limits the scope to Windows, missing the core technical contrast in the source. | Misses ‘third-party API’ and context that it's within CC-Switch; also wrongly constrains to Windows, whereas the source is about difference between PowerShell on Windows and WSL. | Hypothesis is incomplete and misses key details. Should include 'third-party API' and the full context about PowerShell vs WSL. |
| google/gemma-3-4b-it | How to make Gemini use third-party APIs from cc-switch? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: How can I make Gemini use third-party APIs from cc-switch? Content: I can skip this step when running PowerShell on Windows, but I cannot skip it when running on WSL. Does anyone know what's going on? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: How to make Gemini use third-party APIs from cc-switch | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | How to make Gemini use third-party APIs in cc-switch | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: How to enable Gemini to use third-party APIs within cc-switch", "zh-TW": "标题:如何让Gemini使用cc-switch中的第三方API", "ja": "タイトル:Geminiでcc-switch内の第三者APIを使用する方法", "ko": "제목: Gemini에서 cc-switch의 제3자 API를 사용하는 방법", "de":="Titel: Wie kann Gemini die drittanbieterischen APIs innerhalb von cc-switch verwenden?", "fr":="Titre : Comment permettre à Gemini d'utiliser les API tiers dans cc-switch", "it":="Titolo: Come far sì che Gemini utilizzi le API di terze parti all’interno di cc-switch"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Codex Account Pool Placement on CPA How to Specify Thinking? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Codex Account Pool Placement on CPA How to Specify Thinking? |
严重
[准确性]
"Codex Account Pool Placement on CPA How to Specify Thinking?"
理由: Incorrect translation - '思考' here refers to 'reasoning effort' (thinking budget) not literal 'thinking' | Awkward phrasing. Should be 'How to specify reasoning effort when placing a Codex account pool in CPA?' The word order and structure are incorrect. | The reference uses 'reasoning effort', which is more precise. |
| Qwen/Qwen2.5-14B-Instruct | Title: How to specify thoughts when placing the Codex Account Pool in CPA? Content: How to specify thoughts when placing the Codex Account Pool in CPA? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Codex Pool placed with CPA how to specify thinking? Content: Codex Pool placed with CPA how to specify thinking? |
严重
[流畅性]
"Codex Pool placed with CPA how to specify thinking?"
理由: Word-for-word translation resulting in broken syntax. Needs restructuring. | 句子结构不自然且难以理解;缺少介词和正确搭配,如“how to specify reasoning effort when placing a Codex account pool in CPA?” | Should be 'How to specify reasoning effort when placing a Codex account pool in CPA?' as in the reference. |
| Qwen/Qwen3-14B | Title: How to specify thinking when placing Codex account pool in CPA Content: How to specify thinking when placing Codex account pool in CPA |
轻微
[术语]
"How to specify thinking"
理由: In the context of AI models, '思考' often refers to 'reasoning effort' or 'reasoning mode'. 'Thinking' is acceptable but less precise technically. | The word 'thinking' should be 'reasoning effort' to match the reference more precisely. However, this is a minor terminology choice. | ‘思考’ here refers to ‘reasoning effort’; ‘thinking’ is understandable but less precise in this technical context. |
| Qwen/Qwen3.5-4B | Title: Codex Account Pool - How to specify thinking when placed on CPA? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: How to specify thinking when placing Codex Account Pool in CPA? Content: How to specify thinking when placing Codex Account Pool in CPA? |
轻微
[术语]
"thinking"
理由: Should be 'reasoning effort' to match reference | In the context of AI models, 'reasoning effort' or 'thinking mode' is more precise than just 'thinking'. | 源文“指定思考”在该产品语境特指模型推理强度/思考量,应译为"reasoning effort"或类似,"thinking"过于笼统且与参考不符。 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: How to specify thought in CPA with the Codex account pool? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: How to specify thinking when Codex token pool is placed in CPA? Content: How to specify thinking when Codex token pool is placed in CPA? |
轻微
[术语]
"thinking"
理由: The term 'thinking' is used instead of 'reasoning effort', which is less precise. | 指定思考 in this context is about configuring 'reasoning effort' or similar, not generic 'thinking'; under-specified compared to domain usage. | In the context of AI models, 'reasoning effort' (as in reference) is the standard term for '思考', whereas 'thinking' is too generic. |
| google/gemma-3-1b-it | Codex Account Pool - How to specify thinking when placing on CPA? |
轻微
[术语]
"How to specify thinking"
理由: '思考' in this context means 'reasoning effort' (thinking mode), not literal 'thinking'. Should be 'reasoning effort' or 'reasoning mode'. | “指定思考” in this technical context is better as “specify reasoning effort” or similar; “thinking” is vague though understandable. | Hypothesis uses 'thinking' but reference uses 'reasoning effort' which is more accurate for '思考'. |
| google/gemma-3-4b-it | Title: Codex Account Pool Placement in CPA - How to Specify Thinking? |
严重
[准确性]
"Placement in CPA - How to Specify Thinking?"
理由: Incorrect translation: '思考' in this context means 'reasoning effort', not literal 'thinking'. The reference correctly translates this as 'reasoning effort' | Incomplete translation. Reference shows it should be 'How to specify reasoning effort when placing a Codex account pool in CPA?' The hypothesis is missing the context about 'reasoning effort' and 'placing'. | The term 'thinking' is less specific than the reference's 'reasoning effort.' | "指定思考" here means specifying reasoning effort/level for the model, not generic 'thinking'; current phrasing is unclear about the technical meaning. | '思考' in the context of LLMs (like Codex/DeepSeek) refers to 'reasoning effort' or 'thinking mode' (e.g., CoT), not just 'thinking'. 'Reasoning effort' is the standard term. |
| google/translategemma-12b-it | Title: How to specify the placement of the Codex account pool within CPA? Content: How to specify the placement of the Codex account pool within CPA? |
严重
[准确性]
"How to specify the placement of the Codex account pool within CPA?"
理由: Critical mistranslation: The source asks about '指定思考' which means 'specify reasoning effort/thinking', not 'placement'. Reference correctly translates as 'specify reasoning effort' | Misinterpretation of '指定思考'. In the context of AI models, '思考' refers to 'reasoning effort' or 'thinking process', not 'placement'. The translation makes no sense technically. | Should be 'How to specify reasoning effort when placing a Codex account pool in CPA?' per reference. Missing 'reasoning effort' concept. | Should be 'reasoning effort' as per reference. |
| google/translategemma-4b-it | Title: Codex Account Pool - How to specify thinking in CPA? Content: Codex Account Pool - How to specify thinking in CPA? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Codex Account Pool and How to Specify Thinking in CPA' in en |
轻微
[准确性]
"Translation of 'Codex Account Pool and How to Specify Thinking in CPA' in en"
理由: Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'How to specify reasoning effort when placing a Codex account pool in CPA?'. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | Partially translated phrase with added structure; ‘specified thinking’ is an inaccurate rendering of ‘指定思考’ (reasoning effort). Also only meta text is given, not a clean translation. |
| tencent/HY-MT1.5-7B | Translation of '助手' in en |
致命
[准确性]
"Translation of '助手' in en"
理由: Translation is completely off-topic. | 输出内容与源文无关,未翻译标题“Codex 号池 放在CPA如何指定思考”,属于完全错误翻译。 | Hypothesis is incomplete and does not translate the source title. Should be 'How to specify reasoning effort when placing a Codex account pool in CPA?' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Is this one-click install software written by gemini... |
致命
[准确性]
"Is this one-click install software written by gemini..."
理由: Only translates a fragment from the content and omits the main question in the title about Gemini web version being downgraded, as well as the rest of the content about Windows Terminal and OpenClaw fonts. | Mistranslation of '降智' - reference correctly identifies as 'shadowbanned/downgraded', hypothesis only translates literally as 'written by gemini' | Mistranslation of '降智' (downgraded intelligence/shadowbanned) as part of the sentence structure, and missing the question format. Also missing Content translation. | Incomplete and incorrect; should be 'This one-click installation software was written with Gemini...' |
| CohereLabs/tiny-aya-water | What is this one-click install software written with gemini... |
严重
[准确性]
"What is this one-click install software written with gemini..."
理由: Incorrect translation - source asks 'Is the Gemini web version shadowbanned/downgraded?' not about the software | Incomplete and incorrect. Missing title translation entirely. Content is only partial sentence. Should include full context about font display issues in OpenClaw. | Hypothesis is a fragment that only covers part of the content and omits the main point questioning whether Gemini web version is downgraded and the advice about using Windows Terminal and OpenClaw font display. | Mistranslation of the title 'gemini 网页版是降智的吗' (Is the Gemini web version downgraded/dumbed down?). The hypothesis ignores the title's question entirely and misinterprets the content. |
| Qwen/Qwen2.5-14B-Instruct | Title: Is the Gemini web version nerfed? Content: This one-click installation software is written in Gemini... Experts suggest using Windows Terminal instead, as some fonts may not display properly with OpenClaw. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Is Gemini Web Edition dumbed down? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | La versione web di Gemini è stata depotenziata? |
严重
[准确性]
"La versione web di Gemini è stata depotenziata?"
理由: The term 'depotenziata' is less precise than 'shadowbanned/downgraded' in the reference. | Title is captured well, but accompanying content about the one-click installer written with Gemini and advice to use Windows Terminal due to OpenClaw font display issues is omitted. | The hypothesis only translates the title and omits the content paragraph about the installation software and font issues. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Is the web version of Gemini dumb? Content: This one-click install software is written with Gemini... Please use Windows Terminal instead if possible, otherwise OpenCLaw may not display some fonts. |
严重
[准确性]
"Is the web version of Gemini dumb?"
理由: Should be 'Is the Gemini web version shadowbanned/downgraded?' - completely different meaning | '降智' is slang for 'intelligence downgraded' or 'shadowbanned/restricted', 'dumb' is too literal and misses the technical implication of performance degradation. | Incorrect translation of '降智' - hypothesis uses 'dumb' but reference correctly uses 'shadowbanned/downgraded' which is the actual slang meaning |
| Qwen/Qwen3.5-9B | Title: Is Gemini Web Version Nerfed? Content: This one-click installer is written with Gemini... Everyone should try using Windows Terminal instead, otherwise OpenClaw won't display some fonts. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Is Gemini web version dumb? |
严重
[准确性]
"Is Gemini web version dumb?"
理由: Incorrect translation of '降智' - should be 'shadowbanned/downgraded' not 'dumb' | The translation is too informal and does not capture the reference's meaning of 'shadowbanned/downgraded.' | '降智' is slang for 'intelligence downgraded' or 'lobotomized', translating as 'dumb' is too colloquial and misses the nuance of a system downgrade. Reference uses 'shadowbanned/downgraded'. | 'dumb' is crude and inaccurate; should be 'shadowbanned/downgraded' or 'degraded' |
| google/gemma-3-12b-it | Title: Is the Gemini web version nerfed? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Is Gemini Web Version '降智' available? |
严重
[准确性]
"Is Gemini Web Version '降智' available?"
理由: Incorrect translation: '降智' in this context means 'shadowbanned/downgraded' (intelligence reduced), not literal 'available'. The reference correctly translates it as 'shadowbanned/downgraded'. | Failed to translate the slang term '降智' (downgraded intelligence/shadowbanned), leaving it in Chinese and adding nonsensical English. | ‘降智’ here means ‘nerfed/downgraded’, not about availability; the sentence is mis-phrased and loses intended meaning. | Hypothesis fails to properly translate '降智'. Reference shows 'shadowbanned/downgraded' which better captures the meaning. |
| google/gemma-3-4b-it | Is the Gemini web version downgraded? |
轻微
[准确性]
"downgraded"
理由: Incorrect translation: '降智' is internet slang meaning 'shadowbanned/downgraded' (intelligence reduced), not literal 'downgraded' | Incomplete translation. Reference shows 'Is the Gemini web version shadowbanned/downgraded?' - the hypothesis misses the 'shadowbanned' aspect which is important context. | Source implies something like shadowban/downgrading; "downgraded" captures only part of the idea and omits the possible restriction/shadowban nuance. | '降智' is slang for 'intelligence reduction' or 'lobotomized', often implying the model is performing worse or is 'dumber'. 'Downgraded' is acceptable but 'lobotomized' or 'intelligence-reduced' captures the slang nuance better. However, 'downgraded' is passable. |
| google/translategemma-12b-it | Title: Is the web version of Gemini nerfed? Content: This one-click installation software is written using Gemini... Dear users, please try to use Windows Terminal instead, otherwise some fonts may not be displayed with OpenClaw. |
轻微
[术语]
"nerfed"
理由: '降智' literally means 'intelligence reduced'. 'Nerfed' is a good gaming equivalent, but 'downgraded' or 'lobotomized' might be closer to the specific complaint about intelligence. Acceptable but 'shadowbanned' in reference is likely wrong, 'nerfed' is actually a good choice, though 'downgraded intelligence' is more literal. | Should be 'shadowbanned/downgraded' per reference for more accurate translation of '降智'. | Incorrect translation: 'nerfed' is gaming slang for weakening a character, but the source '降智' (reduced intelligence) and reference 'shadowbanned/downgraded' refer to the AI giving dumbed-down responses. Reference uses 'shadowbanned/downgraded' which is more accurate |
| google/translategemma-4b-it | Is the Gemini web version dumbed down? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Title: Is Gemini Web Version a Form of Intelligence Reduction? | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Is the Gemini web version designed to reduce intelligence? |
严重
[准确性]
"designed to reduce intelligence"
理由: Mistranslation of '降智' (slang for lobotomized/downgraded performance) as 'designed to reduce intelligence', missing the slang nuance of performance degradation. | Should be 'Is the Gemini web version shadowbanned/downgraded?' The hypothesis misinterprets '降智' which refers to being shadowbanned or downgraded, not literally reducing intelligence. | “降智”在此语境应理解为“被阉割/削弱/降级”,而不是让用户变笨,语义偏离原本对产品被下调能力的吐槽。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Are there any good reviews for the Honda CBX750F? |
致命
[准确性]
"Are there any good reviews for the Honda CBX750F?"
理由: Completely unrelated to source, which asks about choosing between CFMOTO 450SR '26 Scorpion edition and Zhangxue 500RR; content is replaced with different motorcycle model and topic. | Completely wrong translation; should ask about CFMOTO 450SR vs Zhangxue 500RR motorcycles | Hallucination. The source text asks about 'CFMOTO 450SR' and 'Zhangxue 500RR', but the hypothesis invents 'Honda CBX750F'. | Completely wrong translation - source asks about motorcycles (春风450sr, 张雪500rr), hypothesis translates to Honda CBX750F which is completely unrelated | Incorrect translation of motorcycle brands. |
| CohereLabs/tiny-aya-water | Title: Are you looking for a new motorcycle? |
致命
[准确性]
"Are you looking for a new motorcycle?"
理由: Completely changes the meaning: original asks for advice between two specific motorcycle models; hypothesis turns it into a generic question and omits all key details. | Severely incomplete and inaccurate. Omits specific motorcycle models (CFMOTO 450SR, Zhangxue 500RR), the year specification, and the beginner context. Completely fails to capture the source content. | Completely mistranslated - source asks about specific motorcycle models (CFMOTO 450SR 26款天蝎 or Zhangxue 500RR), not a general question about looking for motorcycles | The reference provides more specific information about the motorcycle models. | Complete hallucination. The source asks for advice on specific models (CFMOTO 450SR vs Zhangxue 500RR) from knowledgeable people. The hypothesis changes the meaning entirely. |
| Qwen/Qwen2.5-14B-Instruct | Title: Anyone familiar with motorcycles? Want to ask about choosing between CFMOTO 450SR 26 or Zongshen 500RR Content: Folks, I want to ask about choosing between CFMOTO 450SR 26 and Zongshen 500RR as a beginner, can control myself |
严重
[准确性]
"Zongshen 500RR"
理由: Missing 'Scorpion edition' for the 450SR and 'Zhangxue' is not the correct brand name | Source “张雪500rr” is a brand/nickname different from Zongshen; mapping to Zongshen is not supported by source text. | '张雪' refers to a specific person/brand (Zhang Xue), not 'Zongshen'. This is a hallucination of the brand name. |
| Qwen/Qwen2.5-7B-Instruct | Is there anyone who knows about motorcycles? Want to ask which one to wait for between Feng Chun 450SR 26th edition Scorpion or Zhang Xue 500RR |
严重
[准确性]
"26th edition"
理由: Should be 'CFMOTO 450SR '26 Scorpion edition or Zhangxue 500RR'. The brand names are incorrectly translated. | “26款”指 2026 款或'26 model year;“26th edition”不符合车辆命名习惯。 | Multiple mistranslations: '春风' should be 'CFMOTO' (the brand name); '张雪' should be 'Zhangxue' (transliteration); missing 'SR' after 450SR; '天蝎' is 'Scorpion' but should be in quotes or noted as edition name. |
| Qwen/Qwen3-14B | C'è qualcuno appassionato di moto qui? Vorrei sapere se aspettare il Chongfeng 450SR 2026 Scorpion o comprare lo ZX-500RR |
轻微
[准确性]
"ZX-500RR"
理由: Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | ‘张雪500rr’ is transliterated brand/model, but mapping it to ‘ZX-500RR’ is not supported by the source; this is an invention. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. | The hypothesis only translates the title and omits the content paragraph where the user identifies as a beginner. |
| Qwen/Qwen3.5-4B | Title: Do any motorcycle enthusiasts know? Asking whether to wait for the CFMOTO 450SR 2026 model or buy the Zongshen 500RR. New rider, can I control myself? |
严重
[准确性]
"Zongshen 500RR"
理由: 张雪 is likely a nickname/brand transliteration (e.g. Zhangxue); "Zongshen" is a different brand and may be incorrect | '张雪' refers to a specific designer/brand (Zhang Xue / Zontes context or specific custom bike), translating it as 'Zongshen' is a hallucination/error unless verified, but likely should be transliterated or kept as 'Zhangxue' as per reference. | Should be 'Are there any bros who know about motorcycles? I want to ask if I should wait for the CFMOTO 450SR '26 Scorpion edition or buy the Zhangxue 500RR?' - missing Scorpion edition and using wrong brand name |
| Qwen/Qwen3.5-9B | Title: Does anyone know about motorcycles? I want to ask whether to wait for the 2026 KTM 450SR Scorpion or buy a Zs 500RR. Newbie, can control my hands. |
严重
[准确性]
"2026 KTM 450SR Scorpion"
理由: 源文“春风450sr 26款天蝎”指CFMOTO品牌,误译为“KTM”,品牌错误且严重误导。 | Should be 'CFMOTO 450SR' not 'KTM 450SR' - 春风 is CFMOTO brand | Incorrect term; should be 'CFMOTO 450SR' | Mistranslates '春风' (CFMOTO) as 'KTM'. Also misinterprets '天蝎' (Akrapovič exhaust brand) as part of the model name or edition incorrectly without context, though reference treats it as edition. | Incorrect translations: '佬' is a colloquial term for 'bro', not removed; '春风450sr' is CFMOTO brand, not KTM; '天蝎' is Scorpion edition; '张雪' is Zhangxue, not 'Zs'
严重
[准确性]
"Zs 500RR"
理由: Mistranslates '张雪' (Zhangxue, a person/brand) as 'Zs'. | “张雪500rr”应音译为Zhangxue 500RR,"Zs"不明确来源。 | Should be 'Zhangxue 500RR' - 张雪 is Zhangxue brand |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Anybody knows about motorcycles? Want to ask whether the 26th Scorpio or buy a Suzuki 500 RR? |
严重
[准确性]
"26th Scorpio"
理由: Misinterprets '26 款' (2026 model/year) as '26th'. 'Scorpio' refers to the exhaust brand Akrapovič (天蝎), not the zodiac sign in this context, though 'Scorpion edition' is clearer. | Missing brand names and model details. Should be 'CFMOTO 450SR' and 'Zhangxue 500RR', not just 'Suzuki 500 RR' | “26款天蝎”是车型年款+版本,应为“'26 Scorpion edition”,而非“26th Scorpio”,后者误解为第26个天蝎 | Incorrect translation of '春风450sr 26款天蝎' - should be 'CFMOTO 450SR 26 Scorpion edition', not '26th Scorpio' |
| google/gemma-3-12b-it | Anyone here familiar with motorcycles? I want to ask if it’s better to wait for the CFMOTO 450SR 2026 model with Scorpion exhaust or buy a Zongshen RK 500RR. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Guys, I want to ask if you know anyone who is interested in buying a Springwind 450SR 26 model Tianxiang or buy a Snow White 500RR? |
严重
[准确性]
"if you know anyone who is interested in buying"
理由: Hypothesis mistranslates brand names and structure. Should be 'wait for the CFMOTO 450SR' and 'Zhangxue 500RR' as shown in reference. | Mistranslated '等...还是买' (wait for... or buy) as 'interested in buying', changing the meaning entirely. | The speaker is asking for advice on what they themselves should buy, not asking if the reader knows someone interested in buying.
严重
[术语]
"Springwind 450SR 26 model Tianxiang"
理由: Incorrect translations: '春风' is brand name 'CFMOTO', not 'Springwind'. '天蝎' is 'Scorpion', not 'Tianxiang'. '张雪' is 'Zhangxue', not 'Snow White'. These are transliterations/names, not literal translations. | Brand/model names are mistransliterated; should be “CFMOTO 450SR '26 Scorpion edition” or similar; ‘Springwind’ and ‘Tianxiang’ are incorrect for motorcycle model context. | Incorrect translation of the brand '春风' (CFMOTO); 'Springwind' is a literal error. |
| google/gemma-3-4b-it | Does anyone know about motorcycles? I want to ask if I should buy a CFMoto 450SR 26 or a Yamaha YZF-R7? |
轻微
[准确性]
"Yamaha YZF-R7"
理由: The source says '张雪 500rr' (Zhangxue 500RR), which is a specific Chinese motorcycle brand (Kove 500RR, founded by Zhang Xue). The hypothesis incorrectly translates this as 'Yamaha YZF-R7', which is a completely different bike. This is a hallucination. | Incorrect brand translation. Should be 'CFMOTO 450SR' not 'CFMoto', and 'Zhangxue 500RR' not 'Yamaha YZF-R7'. Also misses '26款天蝎' (26 Scorpion edition). | Incorrect translations: '春风450sr 26款天蝎' is 'CFMOTO 450SR 26 Scorpion edition', not generic 'CFMoto 450SR 26'. '张雪500rr' is a Chinese motorcycle brand/model name, not 'Yamaha YZF-R7' | "张雪500rr" is mistranslated as Yamaha YZF-R7, a completely different model/brand; this is a factual substitution error. |
| google/translategemma-12b-it | Does anyone have experience with motorcycles? I want to ask if it's better to wait for the CFMOTO 450SR 2026 model with the Tianxie version or buy the Zongshen Snow Leopard 500RR. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Does anyone know about motorcycles? I want to ask if I should buy a CFMoto 450SR 26 or a Suzuki GSX-R 500 RR. |
严重
[准确性]
"Suzuki GSX-R 500 RR"
理由: Incorrect translation - '张雪500rr' is a transliteration of 'Zhangxue 500RR' (a Chinese motorcycle brand/model), not 'Suzuki GSX-R 500 RR'. The reference correctly uses 'Zhangxue 500RR' | Incorrect translation of '张雪500rr'. Should be 'Zhangxue 500RR' (a Chinese motorcycle brand) not 'Suzuki GSX-R 500 RR'. This is a factual error. | "张雪500rr" refers to a specific Chinese domestic brand/model nickname, not a Suzuki model; introducing "Suzuki GSX-R" fabricates a different motorcycle brand. | Mistranslation of '张雪 500rr'. 'Zhangxue' is a specific Chinese motorcycle brand (Zhangxue Motorcycle), not Suzuki. |
| tencent/HY-MT1.5-1.8B | Translation of '有了解摩托车的佬吗,想问问等春风450sr 26款天蝎还是买张雪500rr' in en |
轻微
[准确性]
"Translation of '有了解摩托车的佬吗,想问问等春风450sr 26款天蝎还是买张雪500rr' in en"
理由: Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'Are there any bros who know about motorcycles? I want to ask if I should wait for the CFMOTO 450SR '26 Scorpion edition or buy the Zhangxue 500RR?'. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | No translated sentence is provided; only a meta description of the source to be translated. |
| tencent/HY-MT1.5-7B | Guys, I'm wondering whether to wait for the new Spring Breeze 450SR 26 Scorpion model or just buy the Xue 500RR. I'm new to this, so I need some advice. |
严重
[术语]
"Spring Breeze 450SR"
理由: “春风”是CFMOTO品牌,宜保留为“CFMOTO”;直译为“Spring Breeze”丢失品牌识别。 | Should be 'CFMOTO 450SR '26 Scorpion edition' not 'Spring Breeze'. The reference shows the correct brand name is CFMOTO. | Mistranslation of the brand '春风' (CFMOTO) as 'Spring Breeze'.
严重
[术语]
"Xue 500RR"
理由: Mistranslation of the brand '张雪' (Zhangxue) as 'Xue'. | Should be 'Zhangxue 500RR'. The reference uses the full name 'Zhangxue'. | “张雪500rr”应音译或保留品牌/车型,“Xue 500RR”未能体现是具体品牌车型,且“张”字被省略。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | How is everyone codex kicking? They all say I use a lot of volume 😂 |
严重
[准确性]
"How is everyone codex kicking?"
理由: Mistranslation of '蹬' - reference correctly uses 'freeriding/spamming', hypothesis incorrectly translates as 'kicking' | Mistranslation of '蹬' (slang for freeriding/spamming/using heavily) as 'kicking'. | '蹬' here means freeriding/spamming/using a lot of Codex, not 'kicking'; intent is misrepresented. | Should be 'How is everyone's Codex freeriding/spamming going' to match reference |
| CohereLabs/tiny-aya-water | How is everyone codex kicking? Everyone says I use a lot of volume 😂 |
严重
[准确性]
"How is everyone codex kicking? Everyone says I use a lot of volume 😂"
理由: Incorrect translation of '蹬' - in this context it means 'freeriding/spamming' (taking advantage of free usage) not 'kicking' | Incorrect translation of '蹬': should be 'freeriding/spamming' not 'kicking'. Missing content about Claude Code for academic writing. Incomplete translation. | Mistranslation of '蹬' (slang for freeriding/spamming/using heavily) as 'kicking'. 'Kicking' makes no sense in this context. |
| Qwen/Qwen2.5-14B-Instruct | Title: How is everyone's codex usage going? They all say I use more 😂 Content: Here's a picture first... Urgently seeking a model for academic writing using Claude Code, Skills, MCP process |
严重
[准确性]
"codex usage"
理由: Should be 'How is everyone's Codex freeriding/spamming going?' not just 'usage going' | Source “蹬” implies freeriding/spamming or heavy use; “usage” is more neutral and loses that nuance. | '蹬' is slang for freeriding, spamming, or using heavily/abusing. 'Usage' is too neutral. |
| Qwen/Qwen2.5-7B-Instruct | Title: How is everyone doing with their LinuxDo? They all say I use a lot! |
致命
[准确性]
"LinuxDo"
理由: Should be 'Codex' as in the reference. | 标题主题“codex蹬(白嫖/拼车)”被误译为“LinuxDo”,完全更换了讨论对象,与原文和参考不符。 | Should be 'How is everyone's Codex freeriding/spamming going?' not 'LinuxDo'. The hypothesis is completely incorrect and missing the content section. | Completely wrong translation: 'codex蹬的怎么样' refers to 'Codex freeriding/spamming' (using Codex for free), not 'LinuxDo'. This is a complete mistranslation of the main subject. | Hallucination/Mistranslation. 'codex 蹬' (Codex freeriding/spamming) was incorrectly translated as 'LinuxDo' (likely a forum name confusion). |
| Qwen/Qwen3-14B | Come usate tutti il codex? Dicono tutti che uso molto 😂 |
严重
[准确性]
"Come usate tutti il codex? Dicono tutti che uso molto 😂"
理由: Only roughly matches the title; the body about urgently seeking models, Skills, and MCP workflows for using Claude Code in academic writing is completely omitted. | The term 'codex' should be 'Codex' as in the reference. | The hypothesis only translates the title and omits the content paragraph requesting Claude Code resources. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: How is everyone doing with Codex? They all say I use a lot 😂 Content: First, here's a screenshot... Urgently seeking models for academic writing using Claude Code, Skills, MCP workflow. |
严重
[准确性]
"doing with Codex"
理由: Incorrect translation of '蹬' - hypothesis uses neutral 'doing' but reference correctly interprets as 'freeriding/spamming' which is the slang meaning in this context | Misses "freeriding/spamming" nuance in 蹬, which implies heavy/abusive free usage | Should be 'How is everyone's Codex freeriding/spamming going?' - missing the slang meaning |
| Qwen/Qwen3.5-9B | Title: How is everyone's codex 'deng'? They all say I use a lot 😂 Content: Let me show some images first... Urgently seeking a model for academic writing using Claude Code, Skills, and MCP workflow. |
严重
[准确性]
"codex 'deng'"
理由: Incorrect translation: '蹬' in this context means 'freeriding/spamming' (using without paying), not transliterated 'deng'; reference correctly uses 'freeriding/spamming' | Fails to translate the slang '蹬' (freeriding/spamming/using heavily). Leaving it as 'deng' makes no sense. | Should be 'How is everyone's Codex freeriding/spamming going?' - 蹬 is slang for heavy usage/abuse | “蹬”在这里指薅/白嫖、薅羊毛,参考为"freeriding/spamming",保留拼音"deng"无法传达含义。 | Incorrect term; should be 'freeriding/spamming' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: How is everyone doing at Codex? They all say I use it too much 😂 Content: I'll go ahead and upload some images. Urgently need to use Claude Code for academic writing models, Skills, MCP process |
严重
[准确性]
"How is everyone doing at Codex? They all say I use it too much 😂"
理由: 'doing at Codex' is inaccurate; should be 'Codex freeriding/spamming' to capture the slang meaning | Misinterprets '蹬' (pedaling/freeriding/spamming) as general 'doing'. In this context, it refers to abusing the free tier. | Incorrect translation of '蹬' - should be 'freeriding/spamming' (using without paying), not 'doing' |
| google/gemma-3-12b-it | Title: How is everyone doing with the codex? They all say I use quite a lot 😂 Content: Let me post some pictures first... Urgently requesting a model for academic writing using Claude Code, Skills, MCP process. |
严重
[准确性]
"How is everyone doing with the codex?"
理由: '蹬' in this slang context likely refers to 'freeriding', 'spamming', or 'abusing' the service, not just 'doing'. The reference 'freeriding/spamming' captures the negative/consumptive connotation better. | 蹬 here carries a nuance of 'freeriding/spamming'; simply 'doing with the codex' under-translates that connotation. | Should be 'How is everyone's Codex freeriding/spamming going?' to capture the slang meaning |
| google/gemma-3-1b-it | Title: Everyone's Codex Duck How Good Is It? They All Say My Quant is Big 😂 |
严重
[准确性]
"Everyone's Codex Duck How Good Is It?"
理由: Incorrect translation: '蹬' is internet slang meaning 'freeriding/spamming' (using without paying), not 'Duck'. '用量比较大' means 'usage is pretty high', not 'Quant is Big'. | Completely mistranslates slang; “蹬的怎么样” refers to using/freeriding/spamming Codex, not “Codex Duck How Good Is It?” which is nonsensical. | Hypothesis completely mistranslates '蹬'. Should be 'freeriding/spamming' not 'Duck'. This is a major comprehension failure. | Mistranslated the slang '蹬' (freeride/spam/use heavily) as 'Duck', likely a phonetic or character confusion error. |
| google/gemma-3-4b-it | How is everyone’s Codex ‘deng’? They all say I use a lot of quantity 😂 |
严重
[准确性]
"How is everyone's Codex 'deng'? They all say I use a lot of quantity 😂"
理由: Completely incorrect. '蹬' should be translated as 'freeriding/spamming' (reference), not left as 'deng'. The phrase means 'how is everyone's Codex usage/freeriding going', not a literal transliteration. | Incorrect translation: '蹬' in this context is internet slang for 'freeriding/spamming' (using without paying), not literal 'deng' | The term 'deng' is unclear and less specific than the reference's 'freeriding/spamming.' | '蹬' is slang here, likely meaning 'freeriding', 'spamming', or 'using heavily/abusing'. Translating it as 'deng' (pinyin) leaves the meaning completely obscure to an English reader. |
| google/translategemma-12b-it | Title: How is everyone using Codex? They all say I use it quite a bit 😂 Content: I'll post some screenshots first... Urgently seeking models for academic writing using Claude Code, Skills, and MCP process. |
严重
[准确性]
"How is everyone using Codex?"
理由: '蹬' is slang here likely meaning 'freeriding', 'spamming', or 'using aggressively'. 'Using' is too neutral and misses the negative/connotative slang meaning. | Incorrect translation: The source '蹬' is slang meaning to 'freeride' or 'spam/use excessively', not just 'use'. Reference correctly translates as 'freeriding/spamming' | Should be 'How is everyone's Codex freeriding/spamming going?' per reference. Misses the slang connotation of '蹬'. |
| google/translategemma-4b-it | How is everyone's experience with using the 'LinuxDo codex'? They say I use a lot of resources 😂 |
严重
[准确性]
"'LinuxDo codex'"
理由: Source and reference just say "Codex" and imply freeriding/spamming; adding "LinuxDo" specifies a platform not mentioned, introducing unsupported detail. | Incorrect translation. Should be 'How is everyone's Codex freeriding/spamming going?' not 'experience with using the LinuxDo codex'. The hypothesis misses the slang meaning of '蹬' (freeriding/spamming). | Incorrect translation - 'codex蹬' is slang meaning 'freeriding/spamming' (using without paying), not 'LinuxDo codex'. The reference correctly translates as 'Codex freeriding/spamming' | Hallucinated 'LinuxDo'. The source '蹬' is slang for freeriding/spamming/using heavily, not a platform name. |
| tencent/HY-MT1.5-1.8B | Translation of '蹬' in English |
严重
[准确性]
"Translation of '蹬' in English"
理由: Hypothesis only provides a single word translation request rather than translating the full title. The reference shows '蹬' should be translated as 'freeriding/spamming' in context, not just the isolated character. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | Only states that it is a translation of the word, but does not provide an actual English equivalent in context (e.g., ‘freeriding/spamming/using heavily’). | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'How is everyone's Codex freeriding/spamming going?'. | The hypothesis does not match the reference content. |
| tencent/HY-MT1.5-7B | "en": "Title: How is everyone doing with using the Claude Code? They all say I use it quite a lot😂", "zh-TW": "Title: 大家使用Claude Code的情况如何?他们都说我用得比较多😂", "ja": "Title: みんなClaude Codeの使い方はどうですか?みんな私が結構多く使っていると言っています😂", "ko": "Title: 여러분이 Claude Code를 어떻게 사용하고 계신가요?다들 제가 꽤 많이 사용한다고 하네요😂", "de":="Titel: Wie geht es allen mit der Nutzung des Claude Code? Sie sagen alle, dass ich ihn ziemlich häufig verwende😂", "fr":="Titre: Comment se passe-t-il pour tout le monde avec l’utilisation du Claude Code ? Tout le monde dit que j’en utilise assez souvent😂", "it":="Titolo: Come va a tutti con l’uso del Claude Code? Dicono tutti che lo uso abbastanza spesso😂"} |
严重
[准确性]
"using the Claude Code"
理由: 原文“codex蹬的怎么样”是指在Codex上“蹭/白用”用量情况,误解为“使用Claude Code”,并把另一个工具名称替换掉,改变主题。 | Mistranslation of 'codex 蹬' (Codex freeriding/spamming) as 'using the Claude Code'. The subject changed from Codex to Claude Code, and the action '蹬' (freeride) is lost. | Should be 'How is everyone's Codex freeriding/spamming going?' The source refers to 'codex蹬' which means freeriding/spamming, not general usage of Claude Code. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: Are you using a hub or a VPN? I am. |
致命
[准确性]
"Are you using a hub or a VPN? I am."
理由: Hallucination. The source asks about '中转站' (API proxy sites), but the hypothesis invents a question about hubs/VPNs and adds 'I am'. | Source asks if others are using API proxy sites as a cost-saving method; 'hub or VPN' and mentioning 'I am' introduces unsupported information and changes meaning. | Completely incorrect; should be 'Are you bros using API proxy sites?' | Completely wrong translation - source asks about '中转站' (API proxy sites), hypothesis translates as 'hub or VPN' which is completely different |
| CohereLabs/tiny-aya-water | Content: Are you using a transfer station? |
轻微
[术语]
"transfer station"
理由: Incorrect terminology for the context of API proxies. | Should be 'API proxy site' to match source terminology '中转站' in this context | The correct term is 'API proxy sites', not 'transfer station'. | Literal translation of "中转站"; "API proxy site" or "relay" would be clearer in this technical context.
严重
[准确性]
"Are you using a transfer station?"
理由: Omission of the Title. Mistranslation of '中转站' (proxy/relay station) as 'transfer station' (which implies logistics/transport) instead of 'API proxy'. | Incomplete translation. Missing title entirely. 'transfer station' should be 'API proxy sites'. Should include context about OpenClaw consuming tokens. | Incomplete translation - missing the Title content about 'OpenClaw consumes too many tokens' and 'cost-effective solutions domestically' |
| Qwen/Qwen2.5-14B-Instruct | Title: Guys, lobster is too expensive in terms of tokens, is there any cost-effective solution domestically? Content: Are you guys using relay stations? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Guys, lobsters are too expensive for tokens, is there any cost-effective solution in China? |
严重
[准确性]
"lobsters"
理由: “龙虾”是对 OpenClaw 的昵称,而非真实的“lobsters”;直译为动物会误导读者对讨论对象。 | Literal translation of '龙虾' (lobster/crayfish) missing the metaphor for 'OpenClaw'. Also syntax 'expensive for tokens' is awkward. | Should be 'OpenClaw consumes too many tokens'. '龙虾' (lobster/OpenClaw) is slang for OpenClaw, and the meaning is about token consumption, not expense. | Mistranslation: '龙虾' (lobsters) is the Chinese nickname for 'OpenClaw', not literal lobsters. Reference correctly uses 'OpenClaw'. |
| Qwen/Qwen3-14B | Signori, il granchio è troppo costoso in termini di token, c'è una soluzione conveniente nel paese? |
严重
[准确性]
"il granchio"
理由: Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. Also 'granchio' (crab) is incorrect - '小龙虾' refers to OpenClaw. | ‘龙虾’ here is the nickname for ‘OpenClaw’ (an AI tool), not literally ‘the crab’; wrong reference and animal. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. Also '龙虾' (OpenClaw) mistranslated as 'granchio' (crab) | '龙虾' (Lobster/Crayfish) is the nickname for the tool 'OpenClaw'. Translating it literally as 'crab' (granchio is actually crab, lobster is aragosta) loses the reference to the specific tool. | The term 'granchio' is less precise than 'OpenClaw' in the reference. |
| Qwen/Qwen3.5-4B | Guys, Lobsters are too expensive on tokens. Is there any cost-effective solution available domestically? |
严重
[准确性]
"Lobsters"
理由: Should be 'Bros, OpenClaw consumes too many tokens.' - wrong product name and phrasing | 龙虾 is community nickname for OpenClaw, not literal lobsters; loses product reference and context | Incorrect translation of '龙虾' - hypothesis uses 'Lobsters' but reference correctly uses 'OpenClaw' which is the actual brand/tool name | Failed to recognize '龙虾' as the nickname for 'OpenClaw'. |
| Qwen/Qwen3.5-9B | Title: Guys, using LinuxDo is too expensive on tokens. Are there any cost-effective solutions available domestically? |
致命
[准确性]
"LinuxDo"
理由: 源文“龙虾”是OpenClaw的俗称,误译为LinuxDo导致软件/服务对象完全错误。 | Should be 'OpenClaw consumes too many tokens' - 龙虾 refers to OpenClaw, not LinuxDo | Completely incorrect: '龙虾' refers to 'OpenClaw', not 'LinuxDo'; this is a major mistranslation of the subject | Incorrect term; should be 'OpenClaw' | Completely mistranslates '龙虾' (OpenClaw/Lobster) as 'LinuxDo' (which is likely the forum name, not the tool consuming tokens). |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Oh, shrimp is too expensive in tokens, any good deals in China? |
轻微
[准确性]
"shrimp"
理由: “龙虾”在该圈指OpenClaw工具,直译“shrimp”且改变动物(小龙虾/龙虾)导致完全丢失指向该产品的既有梗 | 'shrimp' is unclear slang; should be 'OpenClaw'. 'any good deals in China' is imprecise; should be 'cost-effective solutions domestically' | Incorrect translation of '龙虾' - should be 'OpenClaw' (the software name), not 'shrimp' | Mistranslates '龙虾' (OpenClaw) as 'shrimp'. Adds 'Oh' which is not in source. Loses the specific tool reference. |
| google/gemma-3-12b-it | Hey folks, lobsters consume too many tokens; are there any cost-effective solutions available domestically? |
严重
[术语]
"lobsters"
理由: 'Lobsters' is incorrect - reference uses 'OpenClaw' which is the actual product name. '龙虾' is a nickname for OpenClaw. | 龙虾 is community slang for 'OpenClaw'; translating as 'lobsters' loses the specific product reference. | The term 'lobsters' is used instead of 'OpenClaw', which is a critical error as it misrepresents the subject. | Should be 'OpenClaw' not 'lobsters'; this is a specific tool reference |
| google/gemma-3-1b-it | Are you guys using a proxy for the token? |
严重
[准确性]
"Are you guys using a proxy for the token?"
理由: Mistranslated '中转站' (API proxy site/relay station) as 'proxy for the token', altering the technical meaning. Also omits the title translation. | Misinterprets “中转站” which refers to API proxy sites; “proxy for the token” is incorrect and omits mention of OpenClaw and token consumption context in the title. | Hypothesis is incomplete and misses context. Should be 'Are you bros using API proxy sites?' and should reference the title about OpenClaw consuming tokens. |
| google/gemma-3-4b-it | Guys, lobster is too expensive in terms of tokens, does China have any cost-effective solutions? |
严重
[准确性]
"lobster"
理由: "龙虾" in this community refers to the OpenClaw project, not literal lobster; translation loses the software/product reference and sounds literal. | Again, '小龙虾' refers to the software 'OpenClaw'. Literal translation 'lobster' is incorrect in this technical context. | Should be 'OpenClaw' - '龙虾' is a colloquial reference to the OpenClaw tool, not a literal lobster. |
| google/translategemma-12b-it | Hey everyone, lobsters consume too many tokens. Are there any cost-effective solutions available domestically? |
严重
[准确性]
"lobsters"
理由: Critical mistranslation: '龙虾' here is slang for 'OpenClaw' (software), not literal lobsters. Reference correctly identifies it as 'OpenClaw' | Should be 'OpenClaw' as per reference. | Should be 'OpenClaw' per reference. '龙虾' is slang for OpenClaw, not literal lobsters. | '龙虾' refers to the tool 'OpenClaw'. Translating as the animal 'lobsters' is incorrect. | "龙虾" here is slang for a specific model/service (e.g., OpenClaw) rather than literal lobsters; translating it literally removes the intended referent. |
| google/translategemma-4b-it | Title: Guys, lobster is too expensive in terms of tokens, does China have any cost-effective solutions? |
严重
[准确性]
"lobster"
理由: "龙虾" is community slang for the product "OpenClaw"; literal "lobster" loses the intended reference to the tool/service. | Failed to translate the slang '龙虾' (OpenClaw) correctly, using the literal animal name instead. | Incorrect translation of '龙虾'. Should be 'OpenClaw consumes too many tokens' not 'lobster is too expensive in terms of tokens'. '龙虾' is slang for OpenClaw, not literal lobster. |
| tencent/HY-MT1.5-1.8B | Translation of '佬们,龙虾太费 token 了,国内有没有什么划算的方案?' in English |
轻微
[准确性]
"Translation of '佬们,龙虾太费 token 了,国内有没有什么划算的方案?' in English"
理由: No actual translation is given, only a meta description that the phrase is to be translated. | Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'Bros, OpenClaw consumes too many tokens. Are there any cost-effective solutions domestically?'. |
| tencent/HY-MT1.5-7B | "en": "Title: Guys, using Linux for lobster is too costly in terms of tokens. Are there any cost-effective solutions domestically?", "zh-TW": "Title: Guys, using Linux for lobster is too expensive in terms of tokens. Are there any cost-effective solutions domestically?", "ja": "Title: Guys, using Linux for lobster is too costly in terms of tokens. Are there any cost-effective solutions domestically?", "ko": "Title: Guys, using Linux for lobster is too costly in terms of tokens. Are there any cost-effective solutions domestically?", "de":="Titel: Leute, die Nutzung von Linux für lobster ist hinsichtlich der Tokens zu teuer. Gibt es im Inland irgendwelche kostengünstigen Lösungen?", "fr": "Titre: Messieurs, l’utilisation de Linux pour lobster est trop coûteuse en termes de tokens. Existe-t-il des solutions rentables sur le marché national ?", "it": "Titolo: Ragazzi, l’uso di Linux per lobster è troppo costoso in termini di token. Esistono soluzioni più economiche nel mercato interno?"}|im_end|> |
致命
[准确性]
"using Linux for lobster"
理由: 原文“龙虾”指OpenClaw产品,不是“在Linux上给龙虾用”;添加“using Linux”是无中生有且严重歧义。 | Should be 'OpenClaw consumes' not 'using Linux for lobster'. The hypothesis completely misinterprets the source. '龙虾' (lobster) is a colloquial reference to OpenClaw. | Severe hallucination. '龙虾' refers to 'OpenClaw', and '佬们' is 'Bros/Users'. The hypothesis invents 'using Linux for lobster' which makes no sense in context. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Invalid input. Please enter a valid email address. |
致命
[准确性]
"Invalid input. Please enter a valid email address."
理由: Completely wrong output - hypothesis is an error message instead of translation | Completely unrelated to the job posting about a resident engineer in Shanghai; full mistranslation/placeholder. | Irrelevant and promotional. | Completely irrelevant response; should translate the job posting about video codec engineer |
| CohereLabs/tiny-aya-water | Title: Hiring in Shanghai for a person who understands video stream encoding and decoding with car navigation experience |
严重
[准确性]
"car navigation experience"
理由: Incorrect translation of '车载经验' - means 'in-vehicle system experience' not 'car navigation experience' | "车载经验" refers to in-vehicle systems experience, not specifically car navigation; narrows the meaning incorrectly. | Mistranslation of '车载经验' (in-vehicle/automotive experience) as 'car navigation'. The source lists AVM/DVR/Codecs, which is broader than just navigation.
轻微
[术语]
"Hiring in Shanghai for a person who understands video stream encoding and decoding with car navigation experience"
理由: 'car navigation experience' is imprecise. Should be 'in-vehicle system experience'. Minor terminology issue. | The correct term is 'in-vehicle system', not 'car navigation'. | Should be 'video streaming' to match source '视频流' |
| Qwen/Qwen2.5-14B-Instruct | Title: Seeking an individual familiar with video stream encoding and decoding with automotive experience for on-site work in Shanghai Content: Friends, looking for someone who works at a car manufacturer related to video stream encoding and decoding, proficient in AI coding. The focus is on AVM DVR applications. Understanding RTSP protocol, H265 format, watermark and texture addition, knowledge of Qualcomm hardware encoding and decoding. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Shanghai Hiring for Video Stream Encoding and Decoding with Automotive Experience Content: Fellow seniors, looking to hire someone from an automotive manufacturer working on video stream encoding and decoding, familiar with AI coding. Focus is on AVM DVR applications. Knowledge of RTSP protocol, H265 format, watermark texture addition, and understanding of Qualcomm hardware encoding and decoding is required. |
轻微
[风格]
"Fellow seniors"
理由: 对“佬友们”更自然的翻译是“bros/folks/guys”;“Fellow seniors”语气不自然,虽不影响理解。 | Translation of '佬友' as 'Fellow seniors' is slightly off; reference uses 'Bros'. Also 'automotive manufacturer' vs 'OEM' (original equipment manufacturer) - both acceptable but different. | Too formal/academic. 'Bros' fits '佬友们'. |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: The translation is incomplete and does not match the reference. | Hypothesis is incomplete/malformed JSON fragment. Should be a complete English translation of the Chinese content | The hypothesis is a broken string containing XML tags and ellipsis, failing to provide any translation of the source text. |
| Qwen/Qwen3.5-4B | Title: Hiring one person in Shanghai who understands video stream encoding/decoding and has automotive experience Content: Hi everyone, we're looking for someone from an OEM to work on video stream encoding/decoding related tasks, proficient in AI coding. The focus is on AVM DVR applications. Knowledge of RTSP protocol, H.265 format, watermarking, texture overlay, and familiarity with Qualcomm hardware encoding/decoding is required. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Seeking a resident engineer in Shanghai with experience in video stream encoding/decoding and automotive applications Content: Hello everyone, looking for someone from an OEM working on video stream encoding/decoding tasks, familiar with AI coding. Focus is on AVM DVR application. Knowledge of RTSP protocol, H.265 format, watermarking, texture overlay, and familiarity with Qualcomm hardware encoding/decoding. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Looking for an experienced encoder with car-mounted experience for streaming decoding. |
严重
[准确性]
"Looking for an experienced encoder with car-mounted experience for streaming decoding."
理由: Translation is incomplete - missing key details: '上海招一个' (hiring in Shanghai), '懂视频流编解码有车载经验的驻场' (resident engineer with video streaming codec and in-vehicle experience) | Incomplete and inaccurate. Missing location (Shanghai), job type (resident engineer), and specific technical requirements (RTSP, H265, Qualcomm) | Severe omission. The hypothesis summarizes the title poorly and completely omits the detailed Content section which contains critical job requirements (AI coding, AVM DVR, RTSP, H265, Qualcomm, etc.). |
| google/gemma-3-12b-it | Title: Seeking a Resident Engineer in Shanghai with In-Vehicle Video Streaming Encoding/Decoding Experience Content: Hey folks, we're looking for someone at a host manufacturer to work on video streaming encoding/decoding related tasks, and who is familiar with AI coding. The direction is AVM DVR applications. Knowledge of RTSP protocol, H.265 format, watermark and texture addition, and understanding of Qualcomm hardware encoding/decoding are required. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of '上海招一个 懂视频流编解码有车载经验的驻场' in en |
严重
[准确性]
"Translation of '上海招一个 懂视频流编解码有车载经验的驻场' in en"
理由: Missing translation of the content section. | The hypothesis is a meta-commentary describing the task rather than providing the actual translation of the source text. | Not a translation; fails to express that Shanghai is hiring a resident engineer with video streaming codec skills and in-vehicle experience. | Hypothesis is incomplete and does not provide actual translation. Should translate to 'Hiring a resident engineer in Shanghai with video streaming codec skills and in-vehicle system experience' as shown in reference. | Hypothesis is not a translation but a placeholder indicating translation task. Missing actual translation content. |
| google/gemma-3-4b-it | Title: Seeking a Video Stream Encoding Expert with Automotive Experience in Shanghai Content: Hello everyone, we’re looking for a video stream encoding specialist working at a host factory, with experience in AI coding. The direction is AVM DVR application. Familiar with RTSP protocol, H.265 format, watermark texture addition, and Qualcomm hardware encoding. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Seeking a full-time engineer in Shanghai with video stream encoding/decoding experience and automotive background. |
严重
[准确性]
"Seeking a full-time engineer in Shanghai with video stream encoding/decoding experience and automotive background."
理由: Incorrect translation: '驻场' means 'resident (engineer)' not 'full-time engineer'; reference uses 'resident engineer' | '驻场' means 'on-site' or 'resident', not necessarily 'full-time' (though often implied). 'Resident engineer' is more accurate. | Hypothesis is missing the content section entirely. Only provides title, missing substantial content about job requirements. |
| google/translategemma-4b-it | Title: Seeking a Video Stream Encoding Expert with Automotive Experience in Shanghai Content: Hello everyone, we are looking for an automotive industry professional with experience in video stream encoding and AI coding. The position is related to AVM DVR applications. Must have knowledge of RTSP protocol, H.265 format, watermark and texture addition, and familiarity with Qualcomm hardware encoding. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '上海招一个 懂视频流编解码有车载经验的驻场' in en |
轻微
[准确性]
"Translation of '上海招一个 懂视频流编解码有车载经验的驻场' in en"
理由: The hypothesis only describes that it is a translation, without providing the English translation content. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'Hiring a resident engineer in Shanghai with video streaming codec skills and in-vehicle system experience'. |
| tencent/HY-MT1.5-7B | Assistant | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Claude Code vs Google Ultra for Carpooling, Which is More Reliable? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Claude Code vs Google Ultra Carpooling, Which is More Stable? |
轻微
[术语]
"Carpooling"
理由: 'Carpooling' is colloquial translation. Should be 'shared subscription' for clarity and consistency with reference. | The correct term is 'shared subscription', not 'Carpooling'. | "拼车" in this context means shared subscription; "carpooling" is literal and may confuse readers, though somewhat understandable. |
| Qwen/Qwen2.5-14B-Instruct | Title: Which is more stable between Claude Code Carpool and Google Ultra Carpool Content: Between Claude Code Carpool and Google Ultra Carpool, which one should you choose? In terms of price, Google Ultra definitely has an advantage... The coding and engineering capabilities of Claude Code don't need to be mentioned, but the price is quite high, and personal usage of carpooling is much less, so without experiencing it, you won't know. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Claude Code Carpool and Google Ultra Carpool - Which is More Stable Content: When it comes to choosing between Claude Code Carpool and Google Ultra Carpool, Google Ultra definitely has an edge in terms of price... Claude Code's coding and engineering capabilities are beyond question, but the price is a bit high, and carpooling options for personal use are much fewer; I haven't carpooled before so I'm not quite sure. |
轻微
[术语]
"Carpool"
理由: Terminology: 'Carpool' is used but reference uses 'shared subscription' which is the standard term in this community context. | While literal, 'shared subscription' is the standard term for '拼车' in this SaaS context. | 此处“拼车”是网络用语,指“shared subscription/shared account”,非实际拼车;“carpool”会严重误导服务类型。 | Should be 'Claude Code shared subscription' and 'Google Ultra shared subscription', not 'Carpool'. '拼车' in this context means shared subscription accounts, not carpooling. |
| Qwen/Qwen3-14B | Condivisione del viaggio con Claude Code e Google Ultra: quale è più stabile? |
严重
[准确性]
"Condivisione del viaggio"
理由: The term 'Condivisione del viaggio' is less precise than 'shared subscription' in the reference. | The hypothesis only translates the title and omits the content paragraph discussing price and features. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. | Mistranslates ‘拼车’ (shared subscription / account sharing) as ‘car ride sharing’; domain-specific metaphor is not preserved. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Which is more reliable, Claude Code carpooling or Google Ultra carpooling? Content: When choosing between Claude Code and Google Ultra carpooling, Google Ultra definitely has an advantage in pricing... |
轻微
[术语]
"carpooling"
理由: Inconsistent translation of '拼车' - hypothesis uses 'carpooling' while reference uses 'shared subscription' | Literal rendering of 拼车; in this context "shared subscription" or "shared account" is clearer industry term | '拼车' in the context of AI accounts means 'shared subscription' or 'group buy', not literal 'carpooling'. |
| Qwen/Qwen3.5-9B | Title: Which is more stable between Claude Code carpooling and Google Ultra carpooling? Content: When choosing between Claude Code carpooling and Google Ultra carpooling, Google Ultra definitely has an advantage in price... Claude Code's coding and engineering capabilities go without saying, but the price is a bit high. Carpooling usage for individuals is also much lower, and those who haven't tried it don't really understand. |
轻微
[术语]
"carpooling"
理由: While '拼车' literally means carpooling, in the context of accounts/subscriptions, 'shared subscription' or 'group buy' is the standard term. | Changed 'shared subscription' to 'carpooling' - terminology inconsistency with reference | 源文“拼车”在此是共享订阅/合租账号的隐喻,参考采用"shared subscription",直译为carpooling在该技术语境略显不自然。 | Should be 'shared subscription' - 拼车 in this context means shared account subscription, not carpooling |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Claude Code vs. Google Ultra: Which Ride-Sharing Service is More Stable Content: Between Claude Code and Google Ultra, when it comes to ride-sharing, Google Ultra definitely has the edge in pricing... Claude Code's coding and engineering capabilities are unmatched, but its price point is relatively high, and for personal use, the number of rides you'd likely take is much lower. In fact, if you haven't tried it yourself, you might not even realize how unclear things can get. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Claude Code Rideshare vs. Google Ultra Rideshare - Which is More Reliable? |
轻微
[术语]
"Rideshare"
理由: Should be 'shared subscription' not 'Rideshare' | 拼车 for subscriptions is closer to 'shared subscription' or 'cost-sharing'; 'rideshare' is metaphorical and could confuse outside the original slang context. | '拼车' in the context of accounts/subscriptions is better translated as 'shared subscription' (as in reference). 'Rideshare' usually refers to physical carpooling and might be confusing. | Reference uses 'shared subscription' not 'rideshare' which is the more accurate technical term. |
| google/gemma-3-1b-it | Claude Code and Google Ultra car comparison: Which is more stable? | 通过 (无共识错误) |
| google/gemma-3-4b-it | Claude Code vs. Google Ultra Carpool: Which is more reliable? | 通过 (无共识错误) |
| google/translategemma-12b-it | Which is more reliable: Claude Code ride-sharing or Google Ultra ride-sharing? | 通过 (无共识错误) |
| google/translategemma-4b-it | Claude Code vs. Google Ultra Carpooling: Which is more reliable? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '拼车' with LinuxDo community context", "zh-TW": "LinuxDo社群中的‘拼车’選擇", "ja": "LinuxDoコミュニティでの『パーティー』の選択", "ko": "LinuxDo 커뮤니티에서의 ‘패키지’ 선택", "de": "LinuxDo Gemeinschaftsbeziehung als ‘Paket’", "fr": "LinuxDo communauté pour le choix du ‘Paquet’", "it": "LinuxDo comunità per la scelta del ‘Pacco’” } |
严重
[准确性]
""en": "Translation of '拼车' with LinuxDo community context""
理由: Does not actually translate ‘拼车’ into the intended sense of ‘shared subscription/account pooling’; only meta-commentary is given. | The hypothesis is a malformed JSON object containing meta-commentary and translations into unrelated languages instead of the required English translation. | Hypothesis provides multilingual metadata and context notes rather than a direct English translation. The format is incorrect and does not provide the actual translation of the title. | Hypothesis is a placeholder text with JSON format rather than an actual translation. The reference shows the correct translation should be 'Which is more stable, Claude Code shared subscription or Google Ultra shared subscription?'. | The hypothesis does not match the reference content. |
| tencent/HY-MT1.5-7B | "en": "Title: Which is More Stable, Claude Code Carpooling or Google Ultra Carpooling?", "zh-TW": "Title: Which is More Stable, Claude Code Carpooling or Google Ultra Carpooling?", "ja": "Title: Which is More Stable, Claude Code Carpooling or Google Ultra Carpooling?", "ko": "Title: Which is More Stable, Claude Code Carpooling or Google Ultra Carpooling?", "de":="Titel: Welche ist stabiler, Claude Code Carpooling oder Google Ultra Carpooling?", "fr": "Title: Which is More Stable, Claude Code Carpooling or Google Ultra Carpooling?", "it": "Title: Which is More Stable, Claude Code Carpooling or Google Ultra Carpooling?" }|im_end|> |
严重
[术语]
"Carpooling"
理由: 这里“拼车”指“拼订阅/共享账号位”,直译为拼车产生真实交通含义,偏离订阅共享语境。 | Should be 'shared subscription' not 'Carpooling'. The reference clarifies the correct term is 'shared subscription'. | Mistranslation of '拼车' (shared subscription/account sharing) as 'Carpooling'. While literal, 'shared subscription' is the accurate technical term in this context. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Join our community of domain experts to stay updated on the latest trends and news related to domain blocking and security measures. |
致命
[准确性]
"Join our community of domain experts to stay updated on the latest trends and news related to domain blocking and security measures."
理由: Hallucinated marketing text unrelated to Xboard panel solution against domain blocking; none of the source information is preserved. | Completely wrong translation - hypothesis is generic marketing text unrelated to source content about xboard and GFW blocking | Complete hallucination. The output is a marketing slogan unrelated to the source text about Xboard solutions. | Completely irrelevant; should translate the Xboard GFW blocking solution content |
| CohereLabs/tiny-aya-water | Title: xboard Anti-Domain Blocking Solution Panel Content: Some users may face issues with domain blocking due to their location or network restrictions. Please ensure your internet connection is stable for optimal performance. |
致命
[准确性]
"Some users may face issues with domain blocking due to their location or network restrictions. Please ensure your internet connection is stable for optimal performance."
理由: Completely invented content about generic network issues; omits mention of Xboard, GFW blocking, and shared/personal usage scenarios. | Completely inaccurate. Hypothesis is generic advice unrelated to source. Should discuss GFW blocking of Xboard domains and shared subscription issues. Entirely fabricated content. | This Content is completely fabricated - not present in source at all. Source says '很多自用的也会被阻断' meaning 'even many personal ones get blocked/interrupted' | Complete hallucination. The source discusses Xboard instances being blocked by the GFW (Great Firewall) for shared or personal users. The hypothesis generates generic IT support advice unrelated to the source. |
| Qwen/Qwen2.5-14B-Instruct | Title: xboard Domain Blocking and Interruption Solution Panel Content: Many users have reported that some shared and personal uses of xboard get blocked. Even personal use can be interrupted. |
严重
[准确性]
"Domain Blocking and Interruption"
理由: Title loses explicit reference to GFW/firewall and that this is a panel/solution to prevent blocking; phrasing is unclear and less specific than source. | Should be 'prevent domain names from being blocked by the GFW' not just 'Domain Blocking and Interruption Solution Panel' | The title structure is confusing. It should be 'Solution to prevent domain blocking'. Also misses 'GFW' context which is implied by '被墙'. |
| Qwen/Qwen2.5-7B-Instruct | Title: xboard Domain Block Prevention Panel Solution Content: Many friends have encountered issues where xboard is blocked due to shared hosting or personal use. Even many personal uses are also being blocked. |
轻微
[准确性]
"shared hosting"
理由: Mistranslation: '合租开车' means 'hosting shared subscriptions/driving (sharing)' not 'shared hosting'; '被阻断' means 'get interrupted/blocked' not just 'blocked'. | “合租开车”在语境中指“shared subscriptions/shared usage of paid plans”,而非“shared hosting”;虽相关但不精确。 | Ambiguous. The source means 'friends who run shared subscriptions', not that the blocking is caused by shared hosting. |
| Qwen/Qwen3-14B | Title: xboard Anti-Domain Blocking Panel Solution Content: Many users, whether some are using it for shared hosting or personal use, xboard is easily blocked by the firewall. Many personal uses will also be blocked. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: xboard Domain/IP Blocked by GFW Solution Panel Content: Many users, some with shared hosting running cars, some personal use, often get blocked by GFW. Many personal users also face blocking. |
严重
[准确性]
"shared hosting running cars"
理由: Should be 'hosting shared subscriptions' not 'running cars' | 合租开车 here means sharing paid accounts/subscriptions, not "running cars"; mistranslation obscures meaning | Misinterpreted '开车' (slang for hosting/sharing accounts) as literally 'running cars'. |
| Qwen/Qwen3.5-9B | Title: xboard Domain Blocking Solution for Firewalled Domains; Content: Many users, some shared hosting, some self-hosted, xboard is easily blocked by the GFW. Many self-hosted instances also get blocked. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: xboard Domain Firewall Blocking Solution |
严重
[准确性]
"Domain Firewall Blocking"
理由: Incomplete; should specify 'Xboard panel solution to prevent domain names from being blocked by the GFW' | Ambiguous. '防域名被墙阻断' means 'Prevent domain from being blocked by the firewall'. The hypothesis sounds like a solution *for* blocking. Also missing Content translation. | Incorrect translation of '被墙' - should be 'blocked by the GFW' (Great Firewall), not 'Domain Firewall' | 标题是“防域名被墙阻断面板解决方案”,重点在于防止域名被墙/被阻断;译文中的“Firewall Blocking”未体现GFW特指,且“面板”缺失 |
| google/gemma-3-12b-it | Title: xboard Domain Blocking and Firewall Solution Panel Content: Many friends, some sharing accounts and others using them personally, find that xboard is easily blocked. Many personal users also experience blocking. |
严重
[准确性]
"Firewall Solution Panel"
理由: The title structure is confusing. It should be 'Solution to prevent domain blocking'. The hypothesis sounds like the panel itself is the problem. | 防域名被墙阻断面板 describes a panel solution to prevent domain blocking by the GFW; 'Firewall Solution Panel' is too generic and drops the explicit 'domain' and GFW blocking aspect. | Should be 'Xboard panel solution to prevent domain names from being blocked by the GFW'
轻微
[风格]
"Many friends"
理由: '佬友' is better rendered as 'bros' or similar slang; 'friends' loses the in-group tone. | Reference uses 'whether they are hosting shared subscriptions or using it personally' which is clearer. | '佬友' should be 'bros' to match the slang tone. |
| google/gemma-3-1b-it | The title 'xboard 防域名被墙阻断面板解决方案' suggests that many users have been blocked by firewalls on xboard. |
轻微
[准确性]
"suggests that many users have been blocked by firewalls on xboard"
理由: The hypothesis is a summary/meta-commentary rather than a direct translation of the source title and content. | Hypothesis is an explanation/interpretation, not a translation. Should provide actual translation of the title and content. | Paraphrases only part of the idea and misrenders the title; the title is about ‘xboard panel solution to prevent domain names from being blocked’, not just that many users have been blocked. | Hypothesis is incomplete and does not provide actual translation. Should translate to 'Xboard panel solution to prevent domain names from being blocked by the GFW' as shown in reference. |
| google/gemma-3-4b-it | Title: xboard Domain Blocking Panel Solution; Content: Many users share cars, and some use them privately. xboard is easily blocked. Many private uses will also be blocked. |
严重
[准确性]
"share cars"
理由: Incorrect metaphor. '合租开车' refers to 'shared subscriptions' not 'sharing cars'. Should be 'whether they are hosting shared subscriptions or using it personally'. | '合租开车' is slang for 'hosting shared subscriptions' (合租) and 'running/operating' (开车) a service. 'Share cars' is a literal mistranslation of the slang. | "合租开车" in this forum context refers to sharing subscriptions/accounts ("carpooling" metaphor), not literally sharing cars; meaning is distorted. |
| google/translategemma-12b-it | Title: xboard Domain Blocking Panel Solution for Circumventing GFW Restrictions Content: Many users, some who share accounts and others who use them privately, find that xboard is easily blocked. Many private users also experience blocking. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: xboard Domain Blocking Solution for GFW |
轻微
[准确性]
"Domain Blocking Solution for GFW"
理由: Missing "panel" / "control panel" aspect of "面板解决方案" from the title; focuses only on domain blocking, not that it's an xboard panel solution. | The title is too brief and does not fully capture the reference's content. | Slightly changes the meaning; the source is about preventing the domain from being blocked, not just a solution for blocking. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: xboard 防域名被墙阻断面板解决方案' in en |
轻微
[准确性]
"Translation of 'Title: xboard 防域名被墙阻断面板解决方案' in en"
理由: Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'Xboard panel solution to prevent domain names from being blocked by the GFW'. | Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | No explicit English translation provided; only a meta-description that it is a translation. |
| tencent/HY-MT1.5-7B | "en": "Title: Solution for Blocking Domain Names from Being Blocked by the Wall in xboard", "zh-TW": "标题:xboard中阻止域名被墙阻挡的解决方案", "ja": "标题:xboardでドメイン名が壁によってブロックされるのを防ぐ方法", "ko": "제목: xboard에서 도메인 이름이 벽에 의해 차단되는 것을 방지하는 방법", "de":="Titel: Lösung zur Verhinderung der Blockierung von Domainnamen durch die ‘Wall’ in xboard", "fr":="Titre : Solution pour empêcher les noms de domaine d'être bloqués par la ‘Muraille’ dans xboard", "it":="Titolo: Soluzione per evitare che i nomi di dominio vengano bloccati dalla ‘Muraglia’ in xboard"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: About the lack of permissions to operate on small shrimp Content: After deploying version 3.2 of Small Shrimp, the strategy for Small Shrimp changed. During conversations, only textual responses will be given and no actual actions will be taken. If you ask why it doesn't work, it will prompt that there is no corresponding tool. |
严重
[术语]
"small shrimp"
理由: '小龙虾' is a nickname for OpenClaw, not literally 'small shrimp'; loses product reference and slang tone. | Mistranslation of '小龙虾' - reference uses 'OpenClaw', hypothesis uses 'small shrimp' which is literal translation | Literal translation of '小龙虾' (OpenClaw tool name) as 'small shrimp' loses the specific reference to the software. |
| CohereLabs/tiny-aya-water | Title: About the lack of permissions to operate on Shrimp |
严重
[术语]
"Shrimp"
理由: "小龙虾" here is the product "OpenClaw"; translating as generic "Shrimp" loses the tool name and technical context. | The correct term is 'OpenClaw', not 'Shrimp'. | Incorrect translation of '小龙虾' - this is nickname for 'OpenClaw', not literal shrimp. Should be 'OpenClaw'
严重
[准确性]
"About the lack of permissions to operate on Shrimp"
理由: Incorrect terminology: '小龙虾' should be 'OpenClaw' not 'Shrimp'. Missing content details about version 3.2 changes and tool availability issues. | Should be 'no permission' to match source '没有权限' | Literal translation of '小龙虾' (OpenClaw). Should be 'OpenClaw'. |
| Qwen/Qwen2.5-14B-Instruct | Title: About Xiaolongxia Warning No Permission to Operate Content: After deploying version 3.2 of Xiaolongxia, its strategy has changed. During conversations, it will only provide textual responses and not take any actions. Asking why it's not working will prompt that there is no corresponding tool. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: About Xiaolongxiao Prompt No Longer Has Permissions to Operate Content: After deploying version 3.2 of Xiaolongxiao, Xiaolongxiao's policies have changed; during conversations, it will only provide textual responses without taking any actual actions. When asking why it doesn't work, it will prompt that there is no corresponding tool. |
严重
[准确性]
"Xiaolongxiao"
理由: Incorrect transliteration. '小龙虾' refers to 'OpenClaw', not 'Xiaolongxiao'. | Mistranslation: '小龙虾' is the Chinese nickname for 'OpenClaw', not 'Xiaolongxiao'. Reference correctly uses 'OpenClaw'. | Should be 'OpenClaw' not 'Xiaolongxiao'. '小龙虾' is slang for OpenClaw in this community. |
| Qwen/Qwen3-14B | Avviso relativo al mancato permesso per operare su Xiao Longxia |
严重
[准确性]
"Avviso relativo al mancato permesso per operare su Xiao Longxia"
理由: The term 'Xiao Longxia' is less precise than 'OpenClaw' in the reference. | Only a vague title is given; the detailed description about version 3.2, changed strategy, only giving text answers, and prompting lack of tools is omitted. | The hypothesis only translates the title and omits the content paragraph explaining the version 3.2 strategy change. | Wrong target language. The target language is Chinese (zh) but the hypothesis is in Italian. This is a complete language mismatch. Also 'Xiao Longxia' is the pinyin transliteration but should be 'OpenClaw' in English as per the reference. | Hypothesis is in Italian while source is in Chinese and target language is zh (Chinese). Complete language mismatch. |
| Qwen/Qwen3.5-4B | Title: Permission denied to operate on Little Shrimp prompt Content: After deploying version 3.2 of Little Shrimp, its strategy has changed. It will only provide textual responses during conversation and no actual actions. When asked why it is not working, it prompts that there are no corresponding tools. |
严重
[准确性]
"Little Shrimp"
理由: Should be 'OpenClaw' not 'Little Shrimp' | Failed to identify '小龙虾' as the tool 'OpenClaw', translating it literally. | Incorrect translation of '小龙虾' - hypothesis uses 'Little Shrimp' but reference correctly uses 'OpenClaw' which is the actual brand/tool name | 小龙虾 is nickname for OpenClaw product; literal "Little Shrimp" loses the product reference needed in this technical context |
| Qwen/Qwen3.5-9B | Title: Notice about Xiaolongxia lacking permission to perform actions Content: After deploying version 3.2 of Xiaolongxia, its strategy has changed; during conversations, only textual responses are provided without taking actual action. Inquiring why it is not working will prompt a message stating that no corresponding tools are available. |
严重
[术语]
"Xiaolongxia"
理由: Incorrect translation: '小龙虾' is the Chinese name for 'OpenClaw', not transliterated; reference correctly uses 'OpenClaw' | “小龙虾”为OpenClaw的绰号,应按参考统一译作 OpenClaw,音译易导致不明所指。 | Should be 'OpenClaw'. Transliterating the nickname reduces clarity for an international audience familiar with the tool's English name. | Should be 'OpenClaw' - 小龙虾 refers to OpenClaw, not a literal lobster |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | About 'Small Crawfish' permission prompt not being able to operate |
严重
[准确性]
"About 'Small Crawfish' permission prompt not being able to operate"
理由: 'Small Crawfish' is awkward; should be 'OpenClaw'. Title structure is awkward; should be 'Regarding OpenClaw prompting that it has no permission to operate' | Mistranslates '小龙虾' (OpenClaw) as 'Small Crawfish'. Missing Content translation. | Incorrect translation of '小龙虾' - should be 'OpenClaw' (the software name), not 'Small Crawfish' |
| google/gemma-3-12b-it | Title: About Xiaolongxia Prompt - No Permission to Operate Content: After deploying version 3.2 of Xiaolongxia, Xiaolongxia's policies have changed. When conversing, it only provides textual answers and does not take any actual actions. Asking him why he is not working will prompt that there are no corresponding tools. |
严重
[术语]
"Xiaolongxia"
理由: 'Xiaolongxia' is the transliteration of '小龙虾' which is a nickname for OpenClaw. Reference uses 'OpenClaw' which is the actual product name. | Should be 'OpenClaw' not 'Xiaolongxia'; this is the English reference name | 小龙虾 here is the product 'OpenClaw'; leaving it as 'Xiaolongxia' in English obscures the established tool name used elsewhere. |
| google/gemma-3-1b-it | Regarding the small shrimp prompt, there is no permission to operate. |
严重
[准确性]
"Regarding the small shrimp prompt, there is no permission to operate."
理由: Hypothesis is incomplete and uses literal translation '小龙虾' as 'small shrimp' instead of 'OpenClaw'. Should provide full translation of the content. | The hypothesis omits the entire content paragraph describing the version update and behavior change. | Misses that “小龙虾” refers to OpenClaw product; literal ‘small shrimp’ is inaccurate in this context and the title should reference OpenClaw's permission prompt. |
| google/gemma-3-4b-it | Title: About crayfish prompt has no permission to operate Content: After deploying version 3.2 of crayfish, the crayfish strategy has changed. When prompted, it only provides textual responses and does not take actual actions. Asking why it doesn’t work will prompt ‘no corresponding tools’. |
严重
[准确性]
"crayfish"
理由: Should be 'OpenClaw' - '小龙虾' is a colloquial reference to the OpenClaw tool, not a literal crayfish. | Repeated error. '小龙虾' = 'OpenClaw' (software), not the animal 'crayfish'. | "小龙虾" refers to OpenClaw software/tool, not literal crayfish; translation loses the product name and confuses the topic. |
| google/translategemma-12b-it | Title: Regarding Xiaolongxia - No permission to operate Content: After deploying version 3.2 of Xiaolongxia, the behavior of Xiaolongxia has changed. When interacting with it, it only provides textual responses and does not take any actual actions. If you ask it why it is not working, it will indicate that there are no corresponding tools. |
严重
[准确性]
"Xiaolongxia"
理由: '小龙虾' is the nickname for 'OpenClaw'. Translating as 'Xiaolongxia' (pinyin) is less helpful than using the project name 'OpenClaw' if known, or at least consistent. Reference uses 'OpenClaw'. | Should be 'OpenClaw' per reference. '小龙虾' is slang for OpenClaw, not a literal product name. | Critical mistranslation: '小龙虾' is a nickname/abbreviation for 'OpenClaw', not a literal translation. Reference correctly uses 'OpenClaw' |
| google/translategemma-4b-it | Title: Regarding Shrimp Permissions Error Content: After deploying version 3.2 of Shrimp, the Shrimp strategy has changed. When interacting, only textual responses are provided, and no actions are performed. Asking why it doesn't work results in an error message indicating that the necessary tools are not available. |
严重
[准确性]
"Shrimp"
理由: Should be 'OpenClaw strategy' not 'Shrimp strategy' | Incorrect translation of '小龙虾'. Should be 'OpenClaw' not 'Shrimp'. '小龙虾' is slang terminology for OpenClaw in this context. | "小龙虾" is community slang for "OpenClaw"; literal "Shrimp" (or crayfish) loses the reference to the specific software/service. | Failed to translate the slang '小龙虾' (OpenClaw) correctly, using the literal animal name. |
| tencent/HY-MT1.5-1.8B | Translation of '关于小龙虾提示没有权限去操作' in en |
轻微
[准确性]
"Translation of '关于小龙虾提示没有权限去操作' in en"
理由: Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'Regarding OpenClaw prompting that it has no permission to operate'. | Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. | The hypothesis lacks an actual translation and only frames the source as something to be translated. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. |
| tencent/HY-MT1.5-7B | Title: Notice about Lack of Permission to Operate on Lobster | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Organize AI model vendors and platforms that can provide long-term stable free quotas (not a public service station) |
轻微
[术语]
"(not a public service station)"
理由: Terminology difference - 'public service station' vs reference 'free API proxy sites' - both are attempts to translate '公益站' but reference is more accurate | '公益站' here are 'free API proxy sites', not 'public service station'; term is misleading in this community context. | Should be 'free API proxy site' to match reference terminology |
| CohereLabs/tiny-aya-water | Title: Organize AI Model Manufacturers and Platforms that Provide Long-term Stable Free API (Not a Public Utility Station) |
严重
[术语]
"Public Utility Station"
理由: "公益站" in this context refers to free API proxy sites; "public utility station" is misleading. | Mistranslation of '公益站'. In this context, it refers to 'free API proxy sites' provided by the community, not a government public utility. | Incorrect translation of '公益站' - means 'free API proxy site' not 'Public Utility Station' |
| Qwen/Qwen2.5-14B-Instruct | Title: List of AI Large Model Providers and Platforms Offering Long-term Stable Free Quotas (Not Public Proxy Sites) Content: With more and more use cases for AI, public proxy sites can sometimes be unstable. Here is a list of providers and platforms that offer relatively long-term stable large model APIs as backups or for testing. |
轻微
[术语]
"Public Proxy Sites"
理由: Should be “free API proxy sites”; “public proxy sites” misses the API aspect which is key in this context. | '公益站' is consistently translated as 'free API proxy sites' in this domain. 'Public Proxy Sites' is slightly off. | Should be 'free API proxy sites' for consistency with source terminology |
| Qwen/Qwen2.5-7B-Instruct | Title: Organize AI Large Model Vendors and Platforms That Provide Stable Free Quotas (Not Public Proxy) Content: Now, there are more and more use cases for AI. Sometimes, public proxies are unstable. So I've compiled some vendors and platforms that can provide relatively stable large model APIs as backups or for testing. |
轻微
[术语]
"Public Proxy"
理由: “公益站”应为“free API proxy sites”,而非泛指公共代理;遗漏“API”和“site”使领域术语不准确。 | 'Free API proxy sites' is the consistent term used in this domain. | Terminology: 'Public Proxy' vs reference 'free API proxy site'. Also 'Organize' vs 'Compiling a list' - slightly different nuance. |
| Qwen/Qwen3-14B | Title: Organize AI large model vendors and platforms that can provide free quotas (non-free shared API sites) long-term and stably Content: Now there are more and more use cases for AI. Free shared API sites are sometimes unstable. I have compiled some vendors and platforms that can provide relatively long-term stable large model APIs as backups or for testing. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Organize AI large model vendors and platforms that can provide free long-term stable API quotas (not public proxies). Content: With increasing use cases for AI, public proxies sometimes become unstable. We have compiled some vendors and platforms offering relatively long-term stable large model APIs as backups or for testing. |
轻微
[术语]
"public proxies"
理由: Should be '(Not free API proxy sites)' for consistency with source | Inconsistent translation of '公益站' - hypothesis uses 'public proxies' while reference uses 'free API proxy sites' | Less specific than "free API proxy sites" for 公益站 in this community | While close, 'free API proxy sites' is the consistent term used in this domain for '公益站'. |
| Qwen/Qwen3.5-9B | Title: Organize AI large model vendors and platforms that can provide long-term stable free quota APIs (not public proxies). Content: With increasing use cases for AI, public proxies sometimes become unstable. Here is an organized list of vendors and platforms offering relatively long-term stable large model APIs as backups or testing. |
轻微
[术语]
"public proxies"
理由: Should be 'free API proxy sites' to match reference terminology | “公益站”在文中固定表达为"free API proxy sites","public proxies"过于笼统,与参考不符。 | Inconsistent with the parenthetical clarification and reference; should be 'free API proxy sites'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Organize AI Model Providers and Platforms that Provide Long-Term Stable Free API (Non-Beneficial Site) |
严重
[术语]
"Non-Beneficial Site"
理由: Mistranslates '非公益站'. '公益站' means 'free community/proxy site'. 'Non-Beneficial Site' is nonsensical. Should be 'Not free proxy sites' or similar. | “非公益站”指“not free API proxy sites”,译为“Non-Beneficial Site”完全改变术语含义,与既有网内用语不符 | 'Non-Beneficial Site' is awkward phrasing; should be 'Not free API proxy sites' or 'Non-proxy sites' |
| google/gemma-3-12b-it | Title: Organize AI large model vendors and platforms that can provide free quotas stably over a long period of time (not a public proxy). Content: With more and more use cases for AI, public proxies are sometimes unstable. Here’s a list of vendors and platforms that can provide relatively stable large model APIs as backups or for testing. |
轻微
[术语]
"public proxy"
理由: 公益站 in this domain is 'free API proxy site'; 'public proxy' omits 'API' and may suggest a broader network proxy concept than intended. | Reference uses 'Not free API proxy sites' which is clearer. Also 'organize' should be 'compiling' or 'compiling a list'. | '非公益站' is better translated as 'Not free API proxy sites' to be consistent with the community terminology established in other items. | Should be 'Not free API proxy sites' for consistency with reference |
| google/gemma-3-1b-it | Translate the Input String into these languages: en, zh-TW, ja, ko, de, fr, it. |
致命
[准确性]
"Translate the Input String into these languages: en, zh-TW, ja, ko, de, fr, it."
理由: Hypothesis is not a translation but an instruction/placeholder. Missing actual translation content. | Hypothesis does not provide actual translation. It appears to be an instruction rather than a translation. Should translate the source title and content as shown in reference. | This is an instruction, not a translation of the given Chinese title/content; completely unrelated to source meaning about listing AI vendors/platforms with stable free APIs. | The hypothesis includes translations in multiple languages, which does not match the reference. | The hypothesis is an instruction prompt rather than a translation of the source text. |
| google/gemma-3-4b-it | Title: Organize AI Large Model Vendors and Platforms that Can Provide Long-Term Stable Free APIs (Non-Public Proxy Sites) | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Organizing AI large model vendors and platforms that can provide free quotas stably over a long period (not a public service). Content: With increasing use cases for AI, public services are sometimes unstable. Here is a list of vendors and platforms that can provide relatively stable large model APIs for backup or testing. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Organize AI large model vendors and platforms that can provide long-term stable free API (non-public service) |
严重
[准确性]
"Organize AI large model vendors and platforms that can provide long-term stable free API (non-public service)"
理由: Mistranslation of '非公益站'. In this context, it means 'Not free API proxy sites', whereas 'non-public service' implies private/restricted access. | Incorrect translation of '非公益站'. Should be 'Not free API proxy sites' not 'non-public service'. The reference clarifies the meaning more accurately. | "非公益站" means "not free API proxy sites"; translating as "non-public service" is vague and misses the proxy-site contrast central to the context. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 整理下AI大模型厂商和平台,能长期稳定提供免费额度的API (非公益站)' in en |
轻微
[准确性]
"Translation of 'Title: 整理下AI大模型厂商和平台,能长期稳定提供免费额度的API (非公益站)' in en"
理由: Hypothesis is a placeholder text rather than an actual translation. The reference shows the correct translation should be 'Compiling a list of AI large model vendors and platforms that can provide long-term, stable free quota APIs (Not free API proxy sites)'. | No actual translated title is provided; only a meta reference to translation. | The hypothesis is a meta-instruction describing the task rather than the actual translation of the source text. | Hypothesis is incomplete and does not provide an actual translation. It only states 'Translation of...' without delivering the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: Organizing AI Large Model Vendors and Platforms That Can Provide Free Quotas of APIs Stably for a Long Time (Not Public Welfare Sites)", "zh-TW": "Title: 整理AI大模型厂商和平台,能长期稳定提供免费额度的API (非公益站)", "ja": "Title: AI大模型ベンダーとプラットフォームの整理——長期にわたり安定して無料APIを提供できるもの(非公益サイト)", "ko": "Title: AI 대형 모델 공급업체 및 플랫폼 정리 — 장기간 안정적으로 무료 API를 제공할 수 있는 곳들(비공익 사이트)", "de":="Titel: Organisation von AI-Big-Model-Anbietern und -Plattformen, die langfristig und stabil kostenlose API-Kontingente bereitstellen können (nicht öffentliche Wohlfahrtseinrichtungen)", "fr":="Titre : Organisation des fournisseurs et des plateformes de grands modèles d’IA capables de fournir des quotas d’API gratuits de manière stable sur le long terme (non sites de bienfaisance)", "it":="Titolo: Elenco dei fornitori e delle piattaforme di grandi modelli IA in grado di offrire quota di API gratuite in modo stabile e a lungo termine (non siti di pubblica utilità)"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Asking a question to the big guys, can the bussiness car API be exempt from proxy? |
严重
[准确性]
"bussiness car"
理由: Typographical error; should be 'business'. | Misinterprets '拼车' (carpooling/shared subscription) as 'car'. In this tech context, it refers to sharing a Business plan subscription, not a vehicle. | Mistranslation of context. 'bussiness拼车' refers to a shared Business subscription, not a 'car API'. The hypothesis misses the shared subscription concept. | Mistranslation: 'bussiness拼车' refers to 'Business shared subscription' (a paid group plan), not 'car'. '免代' means 'used without proxy/VPN', not 'exempt from proxy'. The translation distorts the original meaning. | Misinterprets “bussiness拼车” (shared Business subscription) as a literal carpool, changing the meaning. |
| CohereLabs/tiny-aya-water | Title: Asking a question to the big guys, can the bussiness car API be exempt from proxy? |
严重
[准确性]
"bussiness car"
理由: Mistranslation of '拼车' (carpooling/shared subscription) as 'car'. In this context, it refers to sharing a Business plan subscription, not a vehicle. | Misinterprets “拼车” in this context; it refers to a shared Business subscription/account, not a physical car. | Mistranslation of '免代' - should be 'without a proxy/VPN' not 'exempt from proxy'. Also 'bussiness' is misspelled in source but hypothesis should clarify it refers to Business subscription, not a car API. | Mistranslation: 'bussiness拼车' should be 'Business shared subscription', not 'bussiness car'. The term '拼车' in this context refers to shared subscription, not carpooling. Also '免代' means 'without proxy/VPN', not 'exempt from proxy'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Asking for advice, can the business carpool API be exempted? Content: I want to buy a business carpool on Xianyu, but such an API should require a proxy. However, my work environment does not allow proxies. Do you guys have any solutions? |
严重
[术语]
"bussiness拼车"
理由: Misinterpreted as literal carpooling; in context it refers to a shared Business subscription account/slot, not carpooling | Failed to translate the slang '拼车' (carpooling/sharing subscription) into the contextually accurate 'shared subscription'. 'Carpool' is a literal translation that makes no sense for an API service. | Incorrect translation: 'carpool' should be 'shared subscription' or 'shared seat' - in this context 拼车 refers to sharing a paid subscription, not literal carpooling |
| Qwen/Qwen2.5-7B-Instruct | Title: Ask the pros a question, can business carpool API be free from proxy? Content: I want to buy a business carpool on Xianyu, but such an API should require a proxy, but my work environment cannot use a proxy. Do any of you have any solutions? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Chiedo ai grandi esperti una domanda: è possibile aggirare l'API del carpooling aziendale? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Asking senior members a question, can the carpooling API for business be bypassed? Content: I want to buy a business carpool on Xianyu, but such an API should require a proxy server, however my work environment cannot use proxies. Do any seniors have any solutions? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Asking the experts a question, can the bussiness carpooling API bypass proxies? Content: I want to buy a bussiness carpooling on Xianyu, but such an API should require a proxy, but my work environment cannot use a proxy. Do any experts have any solutions? |
严重
[准确性]
"bussiness carpooling"
理由: Misspelling: 'bussiness' should be 'Business' | Mistranslation of '拼车' (shared subscription/shared seat). Should be 'shared Business subscription' not 'carpooling'. Also 'bussiness' is misspelled in source but hypothesis preserves the error. | Mistranslation of '拼车' in this context. It refers to sharing a subscription cost, not physical carpooling. 'Shared subscription' is accurate. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Ask the big guys a question, can the business carpool API be used without a proxy? |
严重
[准确性]
"can the business carpool API be used without a proxy"
理由: The source '免代' implies bypassing the need for a proxy/VPN entirely, whereas 'used without a proxy' could imply the API works natively without one, missing the nuance of the user's constraint regarding the 'ladder' (VPN). | Title translation is awkward; 'bussiness拼车' refers to 'Business shared subscription' not literal 'business carpool' | Misses context that 'bussiness' refers to a Business subscription/plan, not just a generic business API |
| google/gemma-3-12b-it | Title: Ask the experts a question, can the bussiness carpool API be exempt from agency? |
严重
[准确性]
"bussiness carpool API be exempt from agency"
理由: Incorrect translation of '免代' - should be 'without a proxy/VPN' not 'exempt from agency'. Also 'bussiness' is a typo from source (should be 'business'). | Mistranslation of '免代' (free from proxy/VPN requirement). 'Exempt from agency' is incorrect; should be 'used without a proxy' or similar. | Misinterprets 拼车 as carpool instead of shared subscription and 免代 as not needing a proxy/VPN; overall meaning of the question is distorted. | The translation fails to translate the 'Content' section entirely. The term 'exempt from agency' is an inaccurate translation of '免代' (use without a proxy/agent) in this context. 'Carpool' is a literal translation of '拼车' which is slang for shared subscription; while acceptable, the missing content is a critical omission. |
| google/gemma-3-1b-it | Translation of '问大佬们一个问题' in en |
严重
[准确性]
"Translation of '问大佬们一个问题' in en"
理由: Not an actual translation; just a meta description of the task. No English rendering of the Chinese text is provided. | Hypothesis is incomplete and does not provide a full translation. It only states what is being translated rather than providing the actual translation. | The hypothesis is a meta-instruction describing a translation task rather than the actual translated content. It fails to translate the source text. | Hypothesis is a literal translation placeholder instead of actual translation of the source content | The translation is literal and could be more natural. |
| google/gemma-3-4b-it | Asking seniors about a question: Can business carpooling APIs be bypassed without proxies? |
严重
[准确性]
"Can business carpooling APIs be bypassed without proxies?"
理由: Misinterprets '免代' (free from proxy requirement); should be 'used without a proxy' not 'bypassed without proxies' | Misinterprets “能免代吗” which asks if it can be used without a proxy/VPN, not bypassing the API itself; also omits Xianyu purchase and VPN/work-environment context. | The translation misses some context from the source, such as the specific issue with the API and the work environment. | The source 'bussiness 拼车' refers to shared subscription accounts in this context, not literal 'carpooling'. Translating it as 'carpooling' is a semantic error. |
| google/translategemma-12b-it | Question for experts: Is there a way to use the API for business carpooling without a proxy? |
严重
[准确性]
"bussiness拼车的api能免代吗?"
理由: The term 'business carpooling' is less precise than 'shared Business subscription' as used in the reference. | Misspelling of 'Business' - source has 'bussiness' which is a typo, but hypothesis correctly spells 'business' while reference spells 'Business' - this is actually correct translation of the source typo | Omitted key concepts: "拼车" (shared subscription/seat) mapped to "carpooling" literally but acceptable; however "免代" means "without VPN/proxy" in a more specific sense of network circumvention and context of Xianyu purchase is missing. Also lost mention that work environment cannot use VPN, reducing completeness. |
| google/translategemma-4b-it | Title: Asking for advice on business carpooling APIs - can they be used without VPN? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '问大佬们一个问题,bussiness拼车的api能免代吗?' in en |
严重
[准确性]
"Translation of '问大佬们一个问题,bussiness拼车的api能免代吗?' in en"
理由: The hypothesis is a meta-description of the task rather than the actual translation of the source text. No content was translated. | Incorrect spelling of 'business' | No actual translation is provided; it only describes that a translation should be given and omits key meanings such as carpool/shared subscription, API, and use without proxy/VPN. | Hypothesis is not a translation but a description of the task. The actual translation is missing. |
| tencent/HY-MT1.5-7B | "en": "Title: Asking the experts a question – Can business carpooling APIs be used without proxies?", "zh-TW": "Title: Asking the experts a question – Can business carpooling APIs be used without proxies?", "ja": "Title: Experts, may I ask a question? Can business carpooling APIs be used without proxies?", "ko": "Title: Experts, may I ask a question? Can business carpooling APIs be used without proxies?", "de":="Titel: Können Business-Carpooling-APIs ohne Proxy verwendet werden?", "fr": "Titre : Puis-je demander aux experts si les API de carpooling d’entreprise peuvent être utilisées sans proxy ?", "it": "Titolo: Gli esperti, posso chiedere se le API di carpooling aziendale possono essere utilizzate senza proxy?"} |
严重
[准确性]
"bussiness拼车的api能免代吗?"
理由: Misinterprets slang about shared Business subscription API access; should be about using a shared Business subscription API without VPN/proxy, not generic 'business carpooling APIs'. | Misspelling of 'business' in source text - should be 'business' not 'bussiness' | Mistranslation of slang '拼车' (shared subscription) as literal 'carpooling', losing the technical context. | Source uses 'bussiness拼车' which refers to a shared Business subscription service, not generic 'business carpooling APIs'. The reference correctly interprets this as 'shared Business subscription'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: [Open Source Introduction] CC-Switch-Web: More Compatible, Easier to Use, and More Stable Web Version of CC-Switch |
轻微
[术语]
"Open Source Introduction"
理由: Inconsistent terminology: '开源自荐' is better translated as 'Open Source Self-Promotion' or 'Open Source Introduction' but the reference uses 'Open Source Self-Recommendation'. | Should be 'Open Source Self-Recommendation' to match the source '[开源自荐]' and reference more accurately. | '自荐' implies self-recommendation or showcasing one's own work. 'Introduction' is too generic and misses the nuance of self-promotion common in open source communities. |
| CohereLabs/tiny-aya-water | We apologize for any inconvenience caused by this issue. |
致命
[准确性]
"We apologize for any inconvenience caused by this issue."
理由: Completely unrelated to source content about an open-source web version of CC-Switch; full mistranslation/omission. | Complete hallucination. The hypothesis bears no resemblance to the source text, which is a self-recommendation for an open-source project. | Hypothesis is completely unrelated to source content. Source is about CC-Switch-Web open source project, hypothesis is a generic apology statement. | The hypothesis does not match the source or reference content. | Complete mistranslation: The hypothesis is completely unrelated to the source content. The source is about CC-Switch-Web, but the hypothesis is an apology message. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Self-Nomination] CC-Switch-Web: More Compatible, Better to Use, and More Stable Web Version of CC-Switch Content: Hello everyone! A few days ago when I was using CC-Switch on the server, the GUI version didn't work (no graphical environment in SSH), so I made a web/headless version called cc-switch-web... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Self-recommendation for open source]CC-Switch-Web: More compatible, easier to use, and more stable Web version of CC-Switch Content: Hello everyone! A few days ago, when I was using CC-Switch on the server, the GUI version couldn’t run (no graphics in SSH environment), so I made a web/headless version of cc-switch-web... | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Raccomandazione Open Source: CC-Switch-Web: una versione Web più compatibile, più facile da usare e più stabile di CC-Switch | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Self-introduction] CC-Switch-Web: More Compatible · Easier to Use · More Stable Web Version of CC-Switch Content: Hello everyone! A few days ago, when I used CC-Switch on my server, the GUI version didn't run (no graphics in SSH environment), so I made a web/headless version called cc-switch-web... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Open Source Recommendation] CC-Switch-Web: More Compatible, Easier to Use, and More Stable Web Version of CC-Switch Content: Hello everyone! A few days ago, I tried using CC-Switch on my server, but the GUI version wouldn't run (no graphics environment via SSH), so I created a web/headless version called cc-switch-web... | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Open Source Introduction] CC-Switch-Web: A More Compatible, User-Friendly, and Stable Web Version of CC-Switch |
轻微
[风格]
"[Open Source Introduction]"
理由: '自荐' means 'self-recommendation'. 'Introduction' is too generic and loses the nuance of the author promoting their own work. | “自荐” is better rendered as “self-recommendation” or “self-promotion” rather than “introduction”, slightly changing nuance of the post type. | Should be 'Self-Recommendation' not 'Introduction' to match '自荐' |
| google/gemma-3-12b-it | Title: [Open Source Self-Introduction] CC-Switch-Web: More Compatible, Better Usability, and More Stable Web Version CC-Switch Content: Hello everyone! I ran into an issue where the GUI version wouldn't run (no graphics in SSH environment) when using CC-Switch on my server a few days ago, so I created a web/headless version called cc-switch-web... | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "Hello! I recently encountered an issue where the GUI version of CC-Switch didn't work on the server (SSH environment without graphics). So, I created a web/headless version of CC-Switch...", "zh-TW": "大家好! 我最近在使用 CC-Switch 时,GUI 版本运行不起来(SSH 环境没有图形),所以我做了一个 Web/头目版 CC-Switch...", "ja": "こんにちは! 最近、CC-SwitchのGUI版が動作しなかったのですが(SSH環境にはグラフィックスがない)、Web/ヘッドレス版 CC-Switchを作成しました...", "ko": "안녕하세요! 최근 CC-Switch를 사용하면서 GUI 버전이 작동하지 않았습니다(SSH 환경에는 그래픽이 없으므로), 웹/헤드리스 버전의 CC-Switch를 만들었습니다..." "de": "Hallo! Ich habe kürzlich festgestellt, dass die GUI-Version des CC-Switch nicht auf dem Server (SSH ohne Grafik) funktioniert hat. Deshalb habe ich eine Web-/Headless-Version von CC-Switch erstellt...", "fr": "Bonjour ! J'ai récemment constaté que la version GUI du CC-Switch ne fonctionnait pas sur le serveur (environnement SSH sans graphique), donc j'ai créé une version Web/Headless de CC-Switch...", "it": "Ciao! Recentemente ho riscontrato che la versione GUI di CC-Switch non funzionava sul server (ambiente SSH senza grafica), quindi ho creato una versione Web/Headless di CC-Switch..." } | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: [Open Source Recommendation] CC-Switch-Web: More Compatible, Better, and More Stable Web Version of CC-Switch Content: Hello everyone! I encountered an issue with the GUI version of CC-Switch not running (in an SSH environment without graphics) a few days ago, so I created a web/headless version of cc-switch-web... | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Open Source Self-Introduction] CC-Switch-Web: A more compatible, user-friendly, and stable Web version of CC-Switch Content: Hello everyone! The other day, when I was using CC-Switch on my server, the GUI version wouldn't run (the SSH environment didn't have graphics), so I created a web/headless version called cc-switch-web... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Open Source Recommendation] CC-Switch-Web: More compatible, better, and more stable Web version of CC-Switch | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
严重
[准确性]
"Translation of '登录' in en"
理由: Translation of '登录' is not required in the reference | Hypothesis translates '登录' (login) which is not present in the source. The source is about CC-Switch-Web, not login. | The hypothesis translates the wrong source segment ('登录' vs the provided Title/Content) and outputs a meta-description instead of the translation. | The hypothesis contains only a metatextual description and no English translation of “登录” (e.g. log in / sign in). |
| tencent/HY-MT1.5-7B | "en": "Title: Open Source Recommendation: CC-Switch-Web – A More Compatible, User-Friendly, and Stable Web Version of CC-Switch", "zh-TW": "标题:開源推薦:CC-Switch-Web——一個更加兼容、易用且穩定的CC-Switch網頁版", "ja": "タイトル:オープンソース推薦:CC-Switch-Web——より互換性が高く、使いやすく、安定したCC-Switchのウェブバージョン", "ko": "제목: 오픈소스 추천: CC-Switch-Web — 더욱 호환성이 높고 사용하기 쉽으며 안정적인 CC-Switch의 웹버전", "de":="Titel: Open Source Empfehlung: CC-Switch-Web – Eine kompatiblere, benutzerfreundlichere und stabilere Webversion von CC-Switch", "fr":="Titre : Recommandation open source : CC-Switch-Web – Une version web de CC-Switch plus compatible, plus facile à utiliser et plus stable", "it":="Titolo: Raccomandazione open source: CC-Switch-Web – Una versione web di CC-Switch più compatibile, più user-friendly e più stabile"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 🎯 CC-Switch CLI - Manage Claude/Codex/Gemini on Server! | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: 🎯 CC-Switch CLI - Manage Claude/Codex/Gemini on Server! |
轻微
[风格]
"Manage Claude/Codex/Gemini on Server!"
理由: Incomplete translation: '服务器上也能优雅管理' means 'can elegantly manage on servers too', but the hypothesis only says 'Manage...on Server' without the 'elegantly' and 'too' nuances. | Omission of '优雅' (elegantly), which is a key selling point in the source title. | Omission of “elegantly” weakens the intended tone but core meaning is preserved. |
| Qwen/Qwen2.5-14B-Instruct | Title: 🎯 CC-Switch CLI - Now you can manage Claude/Codex/Gemini elegantly on servers too! Content: Hello everyone! I've been using CC-Switch to manage the configurations of Claude Code, Codex, and Gemini, and it has been great. But recently when trying to use it on a server (via SSH remote connection), I was disappointed because the GUI version requires a graphical interface and simply won't run. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: 🎯 CC-Switch CLI - Manage Claude/Codex/Gemini on Servers Now! Content: Hello everyone! Previously, we used CC-Switch to manage the configurations of Claude Code, Codex, and Gemini, which was very good. But recently when trying to use it on servers (via SSH remote connection), I was taken aback – the GUI version requires a graphical interface and simply won’t run. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Incomplete translation | Hypothesis is incomplete/malformed. Contains only '...}' which is not a valid translation. | The hypothesis is malformed JSON containing only ellipsis and a brace, failing to translate any part of the source text. |
| Qwen/Qwen3.5-4B | Title: 🎯 CC-Switch CLI - Now you can elegantly manage Claude/Codex/Gemini on your server too! Content: Hello everyone! I've been using CC-Switch to manage Claude Code, Codex, and Gemini configurations, and the experience has been great. But recently when trying to use it on my server (via SSH remote connection), I was stumped — the GUI version requires a graphical interface and simply won't run. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: 🎯 CC-Switch CLI - Finally manage Claude/Codex/Gemini elegantly even on servers! Content: Hello everyone! I've been using CC-Switch to manage configurations for Claude Code, Codex, and Gemini, and the experience has been great. But recently when trying to use it on a server (via SSH remote connection), I was stumped — the GUI version requires a graphical interface and simply won't run. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: "🎯 CC-Switch CLI - Now you can elegantly manage Claude/Codex/Gemini on your server too!" | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: 🎯 CC-Switch CLI - Gracefully Manage Claude/Codex/Gemini Even on Servers! | 通过 (无共识错误) |
| google/gemma-3-1b-it | Hello everyone! |
轻微
[准确性]
"Hello everyone!"
理由: Only translates the greeting and omits the rest of the sentence about prior usage of CC-Switch and GUI issues; incomplete translation. | Hypothesis only provides a partial translation of the opening phrase. It does not translate the full source content as required. | Hypothesis only translates the greeting but omits the entire content including the title and main body describing the CLI tool | Severe under-translation. The hypothesis only translates the greeting '大家好!' and completely omits the rest of the source content regarding the usage experience and the GUI issue on servers. |
| google/gemma-3-4b-it | Title: 🎯 CC-Switch CLI - Now you can gracefully manage Claude/Codex/Gemini on servers too! | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: 🎯 CC-Switch CLI - Now you can elegantly manage Claude/Codex/Gemini even on servers! | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: 🎯 CC-Switch CLI - Manage Claude/Codex/Gemini gracefully on servers! | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
轻微
[准确性]
"Translation of '登录' in en"
理由: Translation of '登录' is not required in the reference | Hypothesis translates '登录' (login) which is not present in the source. The source is about CC-Switch CLI, not login. | The hypothesis lacks the actual English term for “登录” and only states that it is a translation. | The hypothesis translates a non-existent segment ('登录') and provides a meta-description instead of translating the provided Title and Content. |
| tencent/HY-MT1.5-7B | "en": "Title: 🎯 CC-Switch CLI - Elegantly manage Claude/Codex/Gemini on servers now!", "zh-TW": "Title: 🎯 CC-Switch CLI - Now you can elegantly manage Claude/Codex/Gemini on servers!", "ja": "Title: 🎯 CC-Switch CLI - 이제 서버에서도 Claude/Codex/Gemini를 우아하게 관리할 수 있습니다!", "ko": "Title: 🎯 CC-Switch CLI - 이제 서버에서도 Claude/Codex/Gemini를 효율적으로 관리할 수 있습니다.", "de":="Title: 🎯 CC-Switch CLI – Jetzt können Sie Claude/Codex/Gemini auch auf Servern effizient verwalten!", "fr": "Title: 🎯 CC-Switch CLI - Gérer maintenant Claude/Codex/Gemini de manière élégante même sur les serveurs!", "it": "Title: 🎯 CC-Switch CLI - Ora è possibile gestire elegantemente Claude/Codex/Gemini anche sui server!"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: ChatGPT Team Slot * 1 One drives in email One bind one account |
轻微
[准确性]
"One drives in email One bind one account"
理由: Complete failure to translate slang. '开车' means starting a shared group/session, and '一卡一绑' means one card linked to one account. The hypothesis is nonsensical. | Mistranslation: '车位' means 'shared seat/slot' not 'Slot'. '开车了' means 'the shared group is starting' not 'One drives in'. '一卡一绑' means 'one card tied to one account' not 'One bind one account'. The translation is garbled and loses the original meaning. | Misinterpretation of the original Chinese text. | Severely garbled translation. Should be 'Join the shared group using your email. 1 card tied to 1 account' based on the source and reference. |
| CohereLabs/tiny-aya-water | Title: ChatGPT Team Slot * 1 One drive went one card one bind |
致命
[准确性]
"one card one bind"
理由: Nonsensical literal translation of slang. '开车了' means the group is starting/active. '一卡一绑' means one credit card bound to one account. The hypothesis is unintelligible. | Fails to clearly convey “1 card tied to 1 account / 1 seat”; wording is confusing. | Nonsensical translation. Should be 'The shared group is starting, 1 card tied to 1 account' or similar. Current translation is incomprehensible. |
| Qwen/Qwen2.5-14B-Instruct | Title: ChatGPT Team Subscription Slot * 1 Available - One Account Per Entry Content: Prize Details: [Prize 1]: ChatGPT Team Subscription Slot * 1, Email Activation Required, No Warranty Provided | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: ChatGPT Team slot * 1 share Content: Prize details: [Prize 1]: ChatGPT Team slot * 1 , email subscription, no warranty", "zh-TW": " Title: ChatGPT 団隊 共享 名額 * 1 個 上車了 一卡一綁\nContent: 獎品細節: [獎品 1]:ChatGPT 団隊 共享 名額 * 1 , 電子郵件 訂閱, 沒有任何質保", "ja": "タイトル: ChatGPT チーム 共有 枠 * 1 相乗り了 一カード一绑定\n内容: 優勝品詳細: [優勝品 1]:ChatGPT チーム 共有 枠 * 1 , イメーレス購読, 何の保証もなし", "ko": "제목: ChatGPT 팀 공유 자리 * 1 개 탑승了一卡一绑\n내용: 상품 세부사항: [상품 1]:ChatGPT 팀 공유 자리 * 1 , 이메일 구독, 어떠한 보증도 없음", "de": "Titel: ChatGPT-Team Platz * 1 Teilen Inhalt: Preisdetails: [Preis 1]: ChatGPT-Team Platz * 1 , E-Mail-Abo, keine Garantie", "fr": "Titre: Équipe ChatGPT place * 1 partage Contenu: Détails du prix : [Prix 1] : place d'équipe ChatGPT * 1 , abonnement par e-mail, aucune garantie", "it": "Titolo: Equipe ChatGPT posto * 1 condivisione Contenuto: Dettagli del premio: [Premio 1] : posto di equilibrio ChatGPT * 1 , iscrizione per posta elettronica, nessuna garanzia" } | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: ChatGPT Team Slot * 1 Shared Subscription Started One Card One Binding Content: Prize Details: [Prize 1]: ChatGPT Team Slot * 1, Email Onboarding, No Warranty Provided | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: ChatGPT Team Slot * 1 Seat. Ready to go. One card, one binding. Content: Prize Details: [Prize 1]: ChatGPT Team Slot * 1 Seat. Email signup, no warranty. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: ChatGPT Team Slot * 1 Account - Ready to Drive, One Card Per Binding Content: Prize Details: [Prize 1]: ChatGPT Team Slot * 1, Email Registration, No Warranty Provided | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: ChatGPT Team shared subscription slot * 1 instance bound to an email |
严重
[准确性]
"* 1 instance bound to an email"
理由: Misses the key context '一卡一绑' (1 card tied to 1 account) and doesn't capture the full meaning of the original title | Misses nuances of “开车了” (the group is starting) and “一卡一绑” (one card per account). Adds “bound to an email”, which is not explicitly in the source and partially distorts details. | The source '一卡一绑' refers to the ratio of payment cards to accounts (1 card tied to 1 account). The hypothesis translates it as the slot being bound to an email, which misses the specific constraint about payment methods implied in the slang. | Incorrect translation: '车位' means 'seat/slot' not 'subscription slot'; '一卡一绑' means '1 card tied to 1 account' not 'bound to an email' |
| google/gemma-3-12b-it | Title: ChatGPT Team Slot * 1 Available - Let's Go! One Card, One Bind Content: Prize Details: [Prize 1]: ChatGPT Team Slot * 1, Email Access, No Warranty | 通过 (无共识错误) |
| google/gemma-3-1b-it | ChatGPT Team Share Slot |
严重
[准确性]
"ChatGPT Team Share Slot"
理由: Could be more precise as 'shared seat'. | Incomplete translation. The hypothesis only attempts to translate the title fragment and omits the content entirely. Additionally, 'Share Slot' is a less accurate rendering of '车位' (seat/spot) in this context compared to 'shared seat'. | "Share Slot" roughly matches "车位"/"shared seat" but omits the nuance of a single seat/slot ("* 1 个") and doesn’t clearly indicate it is one shared seat as in the reference. | Hypothesis is an incomplete translation that omits key details: '1个', '开车了', '一卡一绑', and the content about prize details and warranty | Hypothesis translates only the title phrase but misses the context. '车位' in this context means 'shared seat/slot' but the full translation should address the entire source content about the prize details. |
| google/gemma-3-4b-it | Title: ChatGPT Team Slot * 1 , Driving on, One card one bind |
严重
[准确性]
"Driving on"
理由: '开车了' in this slang context means the shared group is starting or active, not literally 'driving'. The translation is nonsensical. | Severely mistranslates Chinese slang; '车位' means 'shared seat', '开车了' means 'the shared group is starting', '一卡一绑' means 'one card tied to one account' | Misinterprets “开车了” which means the group/share is starting, not literal driving. |
| google/translategemma-12b-it | Title: ChatGPT Team Slot * 1 Available - Ready to Go! (One account per slot) |
严重
[准确性]
"(One account per slot)"
理由: The hypothesis only translates the title but omits the content. The source includes prize details that should be translated. | The hypothesis only translates the title and omits the 'Prize details' content. Additionally, '开车了' is mistranslated as 'Ready to Go' instead of indicating the group/session is starting, and '一卡一绑' is loosely translated. | Source "一卡一绑" means one card bound to one account (card-account binding), not one account per slot; added implication about slot-account mapping not present in source. |
| google/translategemma-4b-it | Title: ChatGPT Team Slot * 1 Share Driving One Card One Bind | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: ChatGPT Team 车位 * 1 个 开车了 一卡一绑' in en |
严重
[准确性]
"Translation of 'Title: ChatGPT Team 车位 * 1 个 开车了 一卡一绑' in en"
理由: No concrete English rendering is supplied; shared seat / group starting / one card bound to one account are not translated. | Hypothesis is not a translation but a description of the task. The actual translation is missing. | Translation of '车位' as 'shared seat' is more accurate | The hypothesis is a meta-instruction string, not the actual translation of the source title. |
| tencent/HY-MT1.5-7B | Title: ChatGPT Team Vehicle Slot * 1; Drive Now, One Card per Account |
严重
[准确性]
"Vehicle Slot"
理由: '车位' here is slang for a shared subscription 'seat' in a Team plan, not a literal vehicle parking slot. | Source '车位' means 'shared seat' or 'shared subscription slot', not 'vehicle slot'. Reference correctly uses 'shared seat'. | Mistranslation of slang '车位' (shared seat/spot) as literal 'Vehicle Slot'.
严重
[准确性]
"Drive Now"
理由: Mistranslation of slang '开车了' (group starting/organizing) as literal 'Drive Now'. | '开车了' in this context means 'the group/subscription is starting' rather than literally 'drive now'. | Source '开车了 一卡一绑' means the shared group is starting/active, with one card tied to one account. Hypothesis misses the context of group activation. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Correct Configuration of OPENCLAW using GGBOOM's Public Proxy Site | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Correct Configuration of OPENCLAW using GGBOOM's Public Proxy Site | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Correct Configuration for Using GGBOOM Free Shared Site with OPENCLAW Content: Following the guidance from teachers and exploration within the forum, the configuration for openclaw.json in lobster is as follows... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Correct Configuration for Using GGBOOM Public Proxy with OPENCLAW Content: Based on the guidance and exploration by teachers in the forum, the configuration of openclaw.json in the lobster is as follows... |
严重
[术语]
"Public Proxy"
理由: Inconsistent terminology: '公益站' means 'free API proxy site' not just 'public proxy'; reference uses 'free API proxy site' | '公益站' is better translated as 'free API proxy site' in this specific community context, though 'Public Proxy' is understandable. | The reference uses 'free API proxy site' which is more specific and accurate than 'Public Proxy'. |
| Qwen/Qwen3-14B | {"Title": "Correct Configuration for Using GGBOOM Free Shared API Site with OPENCLAW", "Content": "With frequent guidance from teachers on the forum and exploration, the configuration in openclaw.json within Longxiao is as follows..."} | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Correct Configuration for Using GGBOOM Free Shared API Site with OpenClaw Content: With frequent guidance and exploration from teachers on the forum, here is the configuration for openclaw.json within the lobster... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Correct Configuration of OPENCLAW Using GGBOOM Free Shared API Site Content: Following frequent guidance and exploration from teachers on the forum, the configuration in openclaw.json within Lobster is as follows... | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | OPENCLAW使用GGBOOM免费API的正确配置方法 |
严重
[准确性]
"免费API"
理由: Translation error: '公益站' means 'free API proxy site' not 'free API'; '正确配置' is correct but context is missing 'The correct configuration for using' | Source says “公益站” (free public/proxy site for public benefit), not explicitly “免费API”; meaning is close but less precise to the proxy-site aspect. | Hypothesis is in Chinese while target language is zh but should be in English; also changes '公益站' (public/free proxy site) to '免费API' which loses nuance |
| google/gemma-3-12b-it | Title: Correct Configuration for OPENCLAW using GGBOOM Public Proxy Site Content: Thanks to the guidance and exploration of teachers in the forum, the openclaw.json configuration in Longxia is as follows... |
严重
[准确性]
"in Longxia"
理由: 误将“龙虾里的 openclaw.json”理解为地名 Longxia;实际应是 OpenClaw/龙虾 这个项目中的配置文件位置。 | '龙虾' (Lobster) is a nickname for the software 'OpenClaw'. Translating it as 'Longxia' (pinyin) loses the semantic meaning and connection to the software name. | 'Longxia' is a transliteration of '龙虾' (lobster) which is a nickname for OpenClaw, not a literal place name; should be 'OpenClaw' | Incomplete translation. Missing content and context. Should reference 'GGBOOM free API proxy site' and 'OpenClaw' more clearly. |
| google/gemma-3-1b-it | The correct configuration of OpenCLaw.json using GGBOOM public API site |
严重
[准确性]
"The correct configuration of OpenCLaw.json using GGBOOM public API site"
理由: Incomplete translation. The hypothesis only covers the title and ignores the content. It also mistranslates '公益站' as 'public API site' instead of 'free API proxy site' or 'charity station' in this specific community context. | Hypothesis incorrectly translates '龙虾' (lobster/OpenClaw nickname) and misses the phrase '经常坛里老师们的指导' which should be 'Through the guidance of experts in the forum'. The translation is incomplete. | Hypothesis is in English instead of Chinese (target language is zh). Also 'GGBOOM公益站' should be translated as 'GGBOOM free API proxy site' not 'public API site' | Misrepresents the relation: original is about using GGBOOM free API proxy site in OPENCLAW; hypothesis implies configuring OpenClaw.json itself using GGBOOM and omits "proxy" and "free" nuance. |
| google/gemma-3-4b-it | Title: Correct Configuration Using GGBOOM Community Sites for OPENCLAW |
严重
[准确性]
"Community Sites"
理由: '公益站' in this context refers to 'free API proxy sites', not 'Community Sites'. The meaning is significantly altered. | Source refers to “GGBOOM公益站” which is a specific free API proxy site, not generic community sites. | Misses key context; should specify 'free API proxy site' not just 'community sites'; reference clarifies this is about using GGBOOM as an API proxy |
| google/translategemma-12b-it | Title: Correct configuration for using OPENCLAW with GGBOOM public proxy. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Correct configuration using GGBOOM free shared API site for OPENCLAW |
轻微
[术语]
"GGBOOM free shared API site"
理由: Should be 'free API proxy site' not 'free shared API site'. Reference correctly uses 'free API proxy site'. | Source/refs use "公益站" as "free API proxy site"; "shared API site" is vague and misses "proxy" aspect. | '公益站' implies a free service provided by the community, often acting as a proxy. 'Free shared API site' is acceptable but 'free API proxy site' (reference) is more precise regarding the technical function. |
| tencent/HY-MT1.5-1.8B | "en": "Title: OPENCLAW uses the GGBOOM public proxy site for correct configuration", "zh-TW": "标题:OPENCLAW使用GGBOOM公益站进行正确配置", "ja": "タイトル:OPENCLAWはGGBOOM公益ステーションを使用して正しい設定を行う", "ko": "제목: OPENCLAW는 GGBOOM 공용 프로키스 사이트를 사용하여 올바른 설정을 수행한다", "de": "제목: OPENCLAW verwendet den GGBOOM öffentlichen Proxy-Server für eine korrekte Konfiguration", "fr": "제목: OPENCLAW utilise le site proxy public GGBOOM pour une configuration correcte", "it": "Titolo: OPENCLAW utilizza il sito proxy pubblico GGBOOM per una configurazione corretta"} |
严重
[准确性]
""en": "Title: OPENCLAW uses the GGBOOM public proxy site for correct configuration""
理由: Source implies configuring how to use the GGBOOM free/public site in OPENCLAW, whereas the hypothesis wording can be read as OPENCLAW using the site itself; it misses “for using the GGBOOM free API proxy site in OPENCLAW” nuance. | The hypothesis outputs a JSON object containing translations for multiple languages instead of just the English translation for the specific source text. It also fails to translate the Content. | Hypothesis contains multiple languages and is not a proper translation. The English part is also incomplete and missing the Content translation. |
| tencent/HY-MT1.5-7B | Title: Correct Configuration for Using the GGBOOM Public Welfare Site with OPENCLAW | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Discussing the usage statistics of the free API sites I have used recently |
严重
[准确性]
"Discussing the usage statistics of the free API sites I have used recently"
理由: Misses the 'showing off' aspect and context. Should be 'Showing off the usage of the free API proxy site over the past few days' per reference. | Misrenders “晒一下…用量” (showing off usage) as “discussing”; also plural “sites” and “I have used” are not in source. | Mistranslation: '晒一下' means 'to show off' not 'discussing'. The original is about showing off usage statistics, not just discussing them. Also missing the context of '公益站' (free API proxy sites). | The source '晒一下' means 'show off' or 'display', implying a boastful sharing of stats. 'Discussing' changes the intent to a neutral conversation. |
| CohereLabs/tiny-aya-water | Content: From [Mo API] Codex - 20260306 Keep talking... Also about 133 billion still okay... Just attack if you want, Business Plan is smooth, keep going! |
严重
[准确性]
"Content: From [Mo API] Codex - 20260306 Keep talking... Also about 133 billion still okay... Just attack if you want, Business Plan is smooth, keep going!"
理由: Mistranslation: '133亿' means '13.3 billion' (tokens), not '133 billion'. '爽蹬' means 'happily/spam/freeride' not 'Keep talking'. '无感' means 'unaffected/doesn't care' not 'smooth'. | Should be '13.3 billion tokens' not '133 billion'. Misread the number. Also 'still okay' misses the context of discussing API usage metrics. | Title is omitted; “爽蹬” (spam/freeride) is mistranslated as plain “Codex”; “也就 133 亿 还行吧” should specify tokens; “无感” means unaffected, not just “smooth”. | Mistranslation of '爽蹬' (happily freeriding/spamming) as just 'Codex'. Misses the negative/consumer slang nuance. |
| Qwen/Qwen2.5-14B-Instruct | Title: Show the Usage of Public Proxy Sites These Days Content: From [Mo API] Codex Smooth Run - Continue Discussion... Around 13.3 Billion... Not Bad... Go Ahead and Attack, Business Plan Unaffected, Keep It Up | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Share the Usage of the Public API Site Recently Content: From [Mo API] Codex爽蹬 - 20260306 Continue discussing... It's around 133 billion, not bad right... Attack away, Business Plan is unaffected, keep going! |
严重
[准确性]
"133 billion"
理由: Critical error: '133亿' means '13.3 billion' (133亿 = 13.3 billion), not '133 billion'; the decimal point is crucial | Misreads 133亿 tokens as generic 133 billion (unspecified); reference clarifies token unit. Losing “tokens” can mislead about what is being counted. | The reference correctly translates this as '13.3 billion tokens', not '133 billion'. The hypothesis misses the 'tokens' unit and gets the magnitude wrong. | Missing unit. In the context of LLM APIs, this refers to 'tokens'. Without the unit, the number is ambiguous.
严重
[准确性]
"Codex爽蹬"
理由: Untranslated term: '爽蹬' appears in source but is not translated; reference translates it as 'Spam/freeride' | The hypothesis doesn't translate this phrase. The reference interprets it as 'Spam/freeride Codex happily' which better captures the slang meaning. | Failed to translate the slang '爽蹬'. It implies using the service freely/abusing it or riding for free. Leaving it in Pinyin makes the text unintelligible to an English reader. |
| Qwen/Qwen3-14B | Pubblicazione dell'utilizzo del sito proxy negli ultimi giorni | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Show usage stats from the free shared API site over these days Content: Continuing discussion from [Mo API] Codex Cool Ride - 20260306 ... That's about 13.3 billion, pretty good... Go ahead and attack it, Business Plan is unaffected, keep it up! | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Check out the usage on this free shared API site over the past few days Content: Continuing from [Mo API] Codex 爽蹬 - 20260306... Around 133 billion requests, not bad... Feel free to attack, Business Plan is unaffected, keep going! |
严重
[准确性]
"133 billion requests"
理由: Context implies tokens (common metric for LLMs), not requests. Reference correctly identifies 'tokens'. | Original refers to 13.3 billion tokens, not requests; unit mistranslated. | Should be '13.3 billion tokens' not '133 billion requests'. The number is 133 but refers to tokens, not requests, and should be 13.3 billion |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Check the usage of these public proxies over the past few days. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Show off the usage of free API sites over the past few days Content: Continuing discussion from [Mo API] Codex cool and refreshing - 20260306... just 133 billion, not bad... feel free to attack, Business Plan is unaffected, keep up the good work! |
严重
[准确性]
"just 133 billion"
理由: Incorrect number conversion. Should be '13.3 billion tokens' not '133 billion'. | Major numerical error: '133 billion' vs reference '13.3 billion tokens' - off by factor of 10 | Missing the unit 'tokens' which is implied in the context of API usage, making the number ambiguous. | Reference clarifies 13.3 billion tokens, not simply 133 billion of an unspecified unit; omission of 'tokens' and the decimal detail is a notable loss of meaning. |
| google/gemma-3-1b-it | Translation of '晒一下这几天公益站用量' in en |
严重
[准确性]
"Translation of '晒一下这几天公益站用量' in en"
理由: The hypothesis is a meta-instruction describing a translation task rather than the actual translated content. It fails to translate the source text. | Hypothesis only states what is being translated rather than providing the actual translation. This is incomplete. | Hypothesis is a literal translation placeholder instead of actual translation of the source content | No translation provided; only a meta description of the task instead of an English rendering. | Could be more naturally translated. |
| google/gemma-3-4b-it | Share your usage of public proxy sites lately |
严重
[准确性]
"Share your usage of public proxy sites lately"
理由: The translation is too brief and misses some context from the source. | Changes perspective from first person “晒一下这几天公益站用量” (showing off my usage) to an imperative asking others to share; also omits the humorous 13.3B usage and Business Plan context. | The hypothesis omits the specific content details (token usage numbers, specific plan mentions, sarcasm) found in the source. | Incomplete translation; misses the boastful tone of '晒' (showing off); should be 'Showing off the usage of the free API proxy site over the past few days' |
| google/translategemma-12b-it | Title: Show off the usage of public proxy sites over the past few days. Content: From [Mo API] Codex, enjoying a smooth experience - 20260306 Continue the discussion... about 133 billion, not bad... Feel free to attack, Business Plan is unaffected, keep up the good work! | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Share your usage of public API sites recently |
严重
[准确性]
"Share your usage of public API sites recently"
理由: Translating '公益站' as 'public API sites' misses the nuance of 'free/charity' (公益) and the specific context of unofficial proxy services. 'Public' implies official or open access, which is incorrect here. | Incomplete and inaccurate translation. Missing context about '公益站' (free API proxy sites) and the specific discussion being referenced. Should capture the boastful tone about usage statistics. | Source is first-person "晒一下这几天公益站用量" meaning author is showing off their usage, not asking others to share; subject and intent changed. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 晒一下这几天公益站用量' in en |
严重
[准确性]
"Translation of 'Title: 晒一下这几天公益站用量' in en"
理由: Translation of '公益站用量' as 'usage of the free API proxy site' is more accurate | The hypothesis is a meta-description string, not the actual translation of the source title. | Hypothesis is incomplete and does not provide a full English translation. It only states 'Translation of...' without delivering the actual translation. Reference provides 'Showing off the usage of the free API proxy site over the past few days' which is a complete, accurate translation. | Hypothesis is not a translation but a description of the task. The actual translation is missing. | The hypothesis does not give any English phrase and omits the idea of “showing off” recent usage of the free proxy site. |
| tencent/HY-MT1.5-7B | Title: Usage Statistics of the Public Welfare Site in Recent Days | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Openclaw How to change search engine API |
轻微
[流畅性]
"Openclaw How to change search engine API"
理由: Missing punctuation and article; more natural would be “How to change the search engine API in OpenClaw”. | The title lacks a verb or preposition to connect the subject and the action, making it sound like a keyword string rather than a natural title. | Missing article. Should be 'How to change the search engine API for OpenClaw' for better fluency. |
| CohereLabs/tiny-aya-water | Title: How to change Openclaw's search engine API | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: How to Change Search Engine API for Openclaw Lobster Content: Bro, when I use openclaw, sometimes the search results are messy and chaotic. So I asked lobster what search API he uses, and the returned result is from Brave's API. Is there any way to switch to Google's search API? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Openclaw How to Change Search Engine API Content: Bro, when I was using openclaw, I found that sometimes its search results were very messy. So I asked the developer what search API he was using, and the response was Brave's API. Is there any way to switch to Google's search API? |
严重
[准确性]
"the developer"
理由: Mistranslation of '问龙虾'. The user asked the software 'Lobster' (OpenClaw), not the human developer. The reference correctly identifies this. | Inaccurate: '我就问龙虾他用的是什么搜索api' means 'I asked OpenClaw (the tool)' not 'the developer'; reference correctly says 'I asked OpenClaw' | Source says “我就问龙虾他用的是什么搜索api”, where “龙虾” refers to the app OpenClaw (lobster), not necessarily the developer; introduces wrong agent. | The hypothesis adds 'the developer' which is not in the source. The reference more accurately translates '龙虾' as 'OpenClaw' directly. |
| Qwen/Qwen3-14B | Come cambiare l'API del motore di ricerca di Openclaw Granchio | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: How to change the search engine API for OpenClaw Lobster Content: Hey guys, I noticed that when using OpenClaw, the search results can sometimes be very messy and disorganized. I asked Lobster what search API he uses, and he replied that it's Brave's API. Is there any way to switch to Google's search API? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: How to change the search engine API for Openclaw Lobster? Content: Hey guys, I noticed that when using Openclaw, sometimes the search results are quite messy and disorganized. I asked Lobster what search API he uses, and he replied that it's Brave's API. Is there any way to switch to Google's search API? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Openclaw Dragon虾如何更换搜索引擎API |
严重
[流畅性]
"Openclaw Dragon虾"
理由: Code-switching within a single noun phrase without reason makes the title unreadable. | Mixes English and Chinese redundantly; “Dragon虾” is unnatural and incorrect for “龙虾” (nickname for OpenClaw). | Hypothesis is missing Content section |
| google/gemma-3-12b-it | Title: Openclaw Lobster - How to Change Search Engine API Content: Hey, I noticed that when using Openclaw, the search results can sometimes be messy and random. I asked Lobster what search API he was using, and the response was Brave’s API. Is there a way to switch to Google’s search API? |
轻微
[准确性]
"I asked Lobster what search API he was using"
理由: 'Lobster' used as entity name when it should be 'OpenClaw' per reference | Incorrect pronoun usage. The antecedent is the software 'Openclaw' (or the entity referred to as '龙虾'), which should be referred to as 'it', not 'he'. | Should be 'I asked OpenClaw what search API it uses' - OpenClaw is a tool, not a person named Lobster. |
| google/gemma-3-1b-it | Translation of 'Openclaw 龙虾怎么换搜索引擎API' in en |
严重
[准确性]
"Translation of 'Openclaw 龙虾怎么换搜索引擎API' in en"
理由: The hypothesis is a meta-instruction describing a translation task rather than the actual translated content. It fails to translate the source text. | Hypothesis only states what is being translated rather than providing the actual translation. This is incomplete. | Hypothesis is a literal translation placeholder instead of actual translation of the source content | Could be more naturally translated. | Meta description only; does not provide the actual English translation of the Chinese title. |
| google/gemma-3-4b-it | Title: How to change search engine API for Openclaw 龙虾 Content: Bro, when I use Openclaw, I find that the search results are sometimes messy and chaotic. I asked Openclaw what search API he was using, and he said it was Brave’s API. Is there a way to switch it to Google’s search API? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: How to change the search engine API for Openclaw Lobster Content: Hey everyone, I noticed that when using Openclaw, the search results sometimes come up as mixed and messy. I asked Lobster what search API he was using, and the response was Brave's API. Is there a way to switch to Google's search API? |
严重
[准确性]
"I asked Lobster what search API he was using"
理由: The source says 'asked 龙虾' (asked [the software/community about] Lobster), implying asking about the tool's configuration. The hypothesis personifies the software ('he') slightly awkwardly compared to the reference ('it'). | The hypothesis incorrectly interprets 'Lobster' as a person and 'he' as a pronoun. Reference clarifies it should be 'OpenClaw' (the tool) and 'it' (the tool), not a person. | User actually asked OpenClaw; using "Lobster" as a person/entity changes reference and can confuse readers unfamiliar with the nickname. |
| google/translategemma-4b-it | Title: How to change Openclaw's search engine API | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾' in English |
严重
[准确性]
"Translation of '龙虾' in English"
理由: Hypothesis translates '龙虾' (lobster) literally, but in this context it refers to 'OpenClaw' (a software tool). The reference correctly translates it as OpenClaw. | No English term is provided; here “龙虾” should be rendered as the product name “OpenClaw”, not literally “lobster”. | The hypothesis is a meta-description string, not the actual translation of the term '龙虾' (which should be 'OpenClaw' or 'Lobster' depending on context, but the output is just the instruction). |
| tencent/HY-MT1.5-7B | "en": "Title: How to Change the Search Engine API for Openclaw Lobster", "zh-TW": "标题:如何更改 Openclaw 龙虾的搜索引擎 API", "ja": "タイトル:Openclaw 龙虾の検索エンジン APIを変更する方法", "ko": "제목: Openclaw Lobster의 검색 엔진 API를 변경하는 방법", "de":="Titel: Wie man die Suchmaschinen-API von Openclaw Lobster ändert", "fr":="Titre : Comment changer l’API du moteur de recherche d’Openclaw Lobster", "it":="Titolo: Come modificare l’API del motore di ricerca di Openclaw Lobster"} |
严重
[准确性]
"Openclaw 龙虾"
理由: Mistranslation of nickname '龙虾' (Lobster, referring to OpenClaw) as part of the proper noun 'Openclaw Lobster', whereas it should be just 'OpenClaw'. | Source uses '龙虾' (lobster) as a nickname/slang for OpenClaw, but this is informal context-specific terminology. Reference correctly uses just 'OpenClaw' without the nickname in formal title. | '龙虾' is a nickname for OpenClaw, not a separate 'Lobster' product; doubling 'Openclaw Lobster' suggests two entities and is inaccurate. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Question about using LDC to purchase Team Car |
严重
[准确性]
"using LDC to purchase Team Car"
理由: “Team拼车” is shared Team subscription, not a literal “Team Car”; loses subscription‑sharing meaning and omits that it’s GPT Team. | Incomplete and inaccurate. Should be 'Asking about the GPT Team shared subscriptions in the forum' to capture the full context. | Mistranslation: '拼车' means 'shared subscription/seat' not 'Car'. 'Team拼车' is 'Team shared subscription'. The translation 'Team Car' is incorrect and changes the meaning. | Misinterpretation of 'GPT Team shared subscriptions' as 'Team Car'. | Mistranslation of '拼车' (carpooling/shared subscription). Translating it as 'Car' implies a physical vehicle, whereas the context is about sharing a GPT Team account. |
| CohereLabs/tiny-aya-water | Title: Asking about LDC GPT Team carpooling issue |
轻微
[术语]
"Title: Asking about LDC GPT Team carpooling issue"
理由: Should be 'shared subscriptions' not 'carpooling'. While 'carpooling' is a metaphor used in Chinese, 'shared subscriptions' is more precise in English. | “carpooling” is literal for “拼车” and can be confusing; should use ‘shared subscription’ or similar in this context. | Terminology: 'Team' should be capitalized consistently as 'Team' (the product name). |
| Qwen/Qwen2.5-14B-Instruct | Title: Inquiry about carpooling for GPT Team within the site Content: Refers to carpooling purchased with LDC. Personal GPT accounts cannot be used for carpooling; one must obtain team qualifications through a domain email and card binding to participate in carpooling. Is this correct? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Ask about GPT Team Carpooling Issue Content: Refers to carpooling using LDC purchased by the Team. Personal GPT standard account cannot carpool; one needs to obtain team eligibility by binding a domain email and card. Is that correct? |
轻微
[术语]
"carpooling"
理由: The reference uses 'shared subscriptions' and 'join the share' which is more accurate terminology than 'carpool' for this context. | 'Carpooling' is a literal translation of '拼车'. 'Shared subscriptions' is the accurate term for the service being discussed. | Literal “carpooling” may confuse readers; in this context it’s community slang for shared subscription; still understandable but non-idiomatic.
严重
[准确性]
"LDC"
理由: Severe syntactic and semantic error. 'LDC' is the currency (forum credits) used to buy the 'Team shared subscription'. The translation suggests the Team purchased the LDC. | The sentence structure is confusing. The reference clarifies: 'Referring to the Team shared subscriptions bought with LDC (forum credits)' which is much clearer. | Inaccurate: '用LDC购买的Team拼车' means 'Team carpool bought with LDC (forum credits)' not 'LDC purchased by the Team'; the LDC is the currency used to buy, not purchased by Team | Does not explain that LDC is forum credits; slight loss of contextual meaning but not critical. |
| Qwen/Qwen3-14B | Domanda sul carpooling del team GPT all'interno del sito | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Asking about carpooling issues within the site's GPT Team Content: Refers to using LDC-purchased Team carpooling; individual GPT standard accounts cannot carpool; one must use a dedicated domain email + card binding to obtain team qualification and then carpool, is that correct? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Asking about GPT Team carpooling within the site Content: Refers to using LDC-purchased Team carpooling; individual GPT accounts cannot carpool and must use a domain email + card binding to obtain team qualification for carpooling, is this correct? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Asking about GPT Team Carpooling Issues Content: Are you saying that personal GPT keys cannot be used for carpooling and that only domain email and bound card can be used to obtain team qualifications for carpooling? |
严重
[准确性]
"Content: Are you saying that personal GPT keys cannot be used for carpooling and that only domain email and bound card can be used to obtain team qualifications for carpooling?"
理由: The source says 'GPT 普号' (standard GPT accounts), not 'keys'. This changes the technical meaning. | Misinterprets '普号' (standard accounts) as 'keys'; should be 'standard GPT accounts' not 'GPT keys' | Shifts from asking a question to rephrasing someone else’s statement (“are you saying”), and replaces “GPT 普号” (standard personal account) with “GPT keys”, which changes the meaning. | '拼车' mistranslated as 'carpooling' throughout; should be 'shared subscription'; 'LDC' (forum credits) not mentioned; '个人GPT普号' means 'personal standard GPT account' not 'personal GPT keys' |
| google/gemma-3-12b-it | Question about carpooling issues within the GPT Team |
严重
[准确性]
"Question about carpooling issues within the GPT Team"
理由: The translation completely omits the 'Content' section. Additionally, 'carpooling issues' is a literal translation of '拼车' which should be 'shared subscriptions' in this software context. | Only the title is translated; all detailed content about LDC credits, domain email, and card binding is omitted, causing substantial information loss. | Incomplete translation. Missing title context and content. Should include 'shared subscriptions' terminology and reference to LDC forum credits. | Missing Content translation entirely - only Title was translated; also 'carpooling' is used literally when it means 'shared subscription' in this context | The title is missing. |
| google/gemma-3-1b-it | Translation of '求问站内GPT Team拼车的问题' in en |
轻微
[准确性]
"Translation of '求问站内GPT Team拼车的问题' in en"
理由: The hypothesis is a meta-instruction describing a translation task rather than the actual translated content. It fails to translate the source text. | Hypothesis only states what is being translated rather than providing the actual translation. This is incomplete. | Meta description only; missing the actual English translation of the query about GPT Team shared subscriptions. | Hypothesis is a literal translation placeholder instead of actual translation of the source content |
| google/gemma-3-4b-it | Question: Regarding questions about the GPT Team carpooling within the station. |
严重
[流畅性]
"Question: Regarding questions about the GPT Team carpooling within the station."
理由: The sentence is redundant ('Question: Regarding questions') and awkward. 'Within the station' is a literal translation of '站内' which should be 'in the forum' or 'on the site'. | Redundant and unidiomatic phrasing (“Regarding questions about”); omits key details about LDC purchase, need for domain email and linked card, and the yes/no question being asked. | Redundant and awkward phrasing; should be 'Asking about the GPT Team shared subscriptions in the forum' per reference |
| google/translategemma-12b-it | Subject: Question about carpooling within the GPT Team Content: I'm asking about using LDC to purchase team carpools. Is it true that individual GPT accounts cannot be used for carpooling, and that you need a domain email address + bound card to obtain team eligibility in order to participate in carpooling? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Question about using GPT Team carpooling within the platform |
轻微
[准确性]
"within the platform"
理由: Mistranslates '拼车' as 'carpooling' when in this context it means 'shared subscription' or 'group sharing'. Should be 'GPT Team shared subscriptions' not 'carpooling'. | '站内' refers to 'within the forum/community', not necessarily a software 'platform'. 'In the forum' is more accurate. | "站内" refers to within the forum/site; "platform" is broader and less precise but generally understandable. |
| tencent/HY-MT1.5-1.8B | Translation of '求问站内GPT Team拼车的问题' in en |
严重
[准确性]
"Translation of '求问站内GPT Team拼车的问题' in en"
理由: Translation of 'GPT Team拼车的问题' as 'GPT Team shared subscriptions' is more accurate | Hypothesis is not a translation but a description of the task. The actual translation is missing. | The hypothesis is a meta-description string, not the actual translation of the source title. | Missing actual translation; should include asking about GPT Team shared subscriptions/carpooling in the forum. |
| tencent/HY-MT1.5-7B | Question about team carpooling on the platform | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Introduction: Are you still struggling to find out how to get started with Claude Code? Do you want to experience the joy of vibe coding but don't know where to begin? |
轻微
[准确性]
"Are you still struggling to find out how to get started with Claude Code?"
理由: Missing title and context. Hypothesis only provides partial content without the title '[Open Source] CCQ: A PowerShell script...' and incomplete foreword. | Only partially reflects the original title and foreword; omits explicit mention of Windows, PowerShell script, and “from zero to out‑of‑the‑box” setup. | Misinterpretation of the original Chinese text. | Major content missing: The hypothesis only translates the '前言' (Foreword) part but completely omits the main title '[开源]CCQ:一个 Windows 从零搭建 Claude Code 环境,到开箱即用的 Powershell 脚本' which is the core content. The reference includes the full title. |
| CohereLabs/tiny-aya-water | Title: 【Open Source】CCQ: A Windows Zero-to-One Setup Guide for Claude Code Environment, from Box to Ready Powershell Script | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | "en": "Title: [Open Source] CCQ: A Windows Environment Setup Script for Claude Code from Scratch to Ready-to-Use PowerShell\nContent: CCQ (Claude Code Quickstart) is an automation installer for setting up a Claude Code development environment within Windows PowerShell. Introduction Are you not a developer but want to experience the joy of coding and don't know where to start? Have you wanted to switch to Claude Code but didn't know how to configure it, what MCP to install, or which workflow to use, only to end up back at AIDE?", "zh-TW": "標題:【開源】CCQ:一個從零開始建立 Claude Code 環境的 Windows PowerShell 腳本,直到可以立即使用的狀態\n內容:CCQ(Claude Code 快速啟動)是一個在 Windows PowerShell 中執行的自動化安裝程式,用於設定 Claude Code 開發環境。前言 您還不是開發者,但想要體驗編程的樂趣卻不知從何下手嗎?您是否想轉換到 Claude Code,但不清楚如何進行配置、安裝什麼 MCP 或使用哪種工作流程,最後只能回到 AIDE?", "ja": "タイトル:[オープンソース] CCQ:Windows PowerShell でクレオードコードの開発環境をゼロからセットアップし、すぐに使えるスクリプト\nコンテンツ:CCQ(クレオードコード クイックスタート)は、Windows PowerShell 内で実行されるクレオードコード開発環境の自動インストーラーです。はじめに デベロッパーではないが、コーディングの楽しさを体験したいがどのように始めればよいかわからないですか?クレオードコードに切り替えたいが、どのように設定するか、どの MCP をインストールするか、どのワークフローを使用するかわからないため、結局 AIDE に戻ってしまうことはありませんか?", "ko": "제목: [오픈소스] CCQ: Windows PowerShell에서 Claude Code 개발 환경을 처음부터 세팅하고 바로 사용할 수 있는 스크립트\n내용: CCQ(Claude Code Quickstart)는 Windows PowerShell에서 실행되는 Claude Code 개발 환경 자동 설치 프로그램입니다. 서문 개발자가 아니지만 코딩의 즐거움을 경험해보고 싶으나 어디서 시작해야 할지 모르겠습니까? Claude Code로 전환하려고 하지만 어떻게 설정해야 하고 어떤 MCP를 설치하며 어떤 워크플로우를 사용해야 하는지 모른 채 결국 AIDE로 돌아가는 것은 아닙니까?", "de": "Titel: [Offene Quellen] CCQ: Ein Windows-Umgebungsskript für Claude Code von Grund auf bis hin zu einer sofort einsatzbereiten Umgebung mit PowerShell\nInhalt: CCQ (Claude Code Schnellstart) ist ein Automatisierungsinstaller zum Aufsetzen eines Claude Code Entwicklungsumfelds innerhalb von Windows PowerShell. Vorwort Sind Sie kein Entwickler, aber möchten Sie den Spaß am Programmieren erleben und wissen nicht, wo Sie beginnen sollen? Möchten Sie zu Claude Code wechseln, aber wissen nicht, wie Sie es konfigurieren, welche MCPs Sie installieren oder welchen Workflow Sie verwenden sollen, nur um letztendlich wieder bei AIDE zu landen?", "fr": "Titre : [Code Ouvert] CCQ : Un script pour construire un environnement de développement Claude Code sous Windows depuis zéro jusqu'à prêt à l'emploi avec PowerShell\nContenu : CCQ (Claude Code Quickstart) est un programme d'installation automatisé qui permet de mettre en place un environnement de développement Claude Code dans Windows PowerShell. Présentation Vous n'êtes pas développeur mais souhaitez-vous découvrir le plaisir du codage et ne savez pas par où commencer ? Voulez-vous passer à Claude Code mais ne savez pas comment le configurer, quel MCP installer ou quelle méthode de travail utiliser, pour finir par revenir à AIDE ?", "it": "Titolo: [Codice Aperto] CCQ: Uno script per impostare un ambiente di sviluppo Claude Code su Windows da zero fino all'utilizzo immediato con PowerShell\nContenuto: CCQ (Claude Code Quickstart) è uno strumento di installazione automatizzato che consente di configurare un ambiente di sviluppo Claude Code all'interno di Windows PowerShell. Introduzione Non sei un programmatore ma vorresti provare il piacere del codaggio e non sai da dove cominciare? Vuoi passare a Claude Code ma non sai come configurarlo, quale MCP installare o | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: [Open Source] CCQ: Building a Claude Code Environment from Zero on Windows to Ready-to-Use PowerShell Scripts\nContent: CCQ (Claude Code Quickstart) is an automated installer for setting up a Claude Code development environment in Windows PowerShell. Preface Are you not a developer but still want to experience the fun of vibe coding and don't know where to start? Are you considering switching to Claude Code but unsure about how to configure it or what MCPs to install, only to end up back at AIDE?", "zh-TW": "標題: [開放原始碼] CCQ:從零開始在Windows建立Claude Code環境,到開箱即用的PowerShell腳本\n內容: CCQ(Claude Code快速上手)是一個在Windows PowerShell中執行的自動安裝程式,用於設定Claude Code開發環境。序文 您不是開發人員但仍然想要體驗vibe編程的乐趣,卻不知從何開始嗎?您考慮轉換為使用Claude Code,但對於如何設定或安裝什麼MCP以及使用什麼工作流程感到困惑,最終又回到了AIDE嗎?", "ja": "タイトル: [オープンソース] CCQ: Windows 上からゼロで Claude Code エンビロンメントを構築し、すぐに使用できる PowerShell スクリプト\n内容: CCQ(Claude Code キックスタート)は、Windows PowerShell で実行される自動インストーラーであり、Claude Code 開発環境のセットアップを行います。前置き デベロッパではないが、vibe コードの楽しさを体験したいと思っているあなたは、どこから始めればよいか分かりませんか?Claude Codeに切り替えることを考えていますが、どのように設定するのか、どのような MCP をインストールすればよいか、どのワークフローを使用すればよいか分からず、結局 AIDE に戻ってしまうことはありませんか?", "ko": "제목: [오픈소스] CCQ: 윈도우에서 클로이드 코드 환경을 처음부터 구축하여 사용 가능한 파워쉘 스크립트\n내용: CCQ(클로이드 코드 빠른 시작)는 윈도우 PowerShell에서 실행되는 자동 설치 프로그램으로, 클로이드 코드 개발 환경 설정에 사용됩니다. 서문 개발자가 아닌데 여전히 vibe 코딩의 재미를 체험하고 싶지만 어디서부터 시작해야 할지 모르는 당신은 어떻게 하면 좋을까요? 클로이드 코드로 전환하려고 하지만 설정 방법이나 어떤 MCP를 설치해야 하는지, 어떤 워크플로를 사용해야 하는지 모른다고 느끼다가 결국 AIDE로 돌아가는 것은 아닐까요?", "de": "Titel: [Open Source] CCQ: Ein Windows-Basisumgebungsautomatisierter-Installator für Claude Code aus dem Nullpunkt bis zum sofort verwendbaren PowerShell-Skript\nInhalt: CCQ (Claude Code Schnellstart) ist ein automatischer Installer, der eine Claude Code-Entwicklungsumgebung im Windows-PowerShell ausführt. Vorwort Bist du kein Entwickler und möchtest den Spaß an Vibe-Coding erleben, weißt du aber nicht, wo du anfangen sollst? Bist du dabei, zu Claude Code wechseln zu überlegen, weißt du aber nicht, wie du es konfigurieren oder welche MCPs du installieren solltest, und landest am Ende wieder bei AIDE?", "fr": "Titre : [Open Source] CCQ : Construire un environnement de développement Claude Code à partir de zéro sur Windows pour des scripts PowerShell prêts à l’emploi\nContenu : CCQ (Quick Start Claude Code) est un instalateur automatique exécuté dans Windows PowerShell qui permet d’installer un environnement de développement Claude Code. Préface Vous n’êtes pas développeur et vous voulez expérimenter le plaisir de la programmation vibe mais ne savez pas par où commencer ? Vous envisagez de passer à Claude Code mais vous ne savez pas comment configurer votre environnement ou quelles MCP installer, et finissez-vous par revenir à AIDE ?", "it": "Titolo: [Open Source] CCQ: Costruisci un ambiente di sviluppo Claude Code da zero su Windows per script PowerShell pronti all'uso\nContenuto: CCQ (Quick Start Claude Code) è un installer automatico eseguito in Windows PowerShell che consente di configurare un ambiente di sviluppo Claude Code. Prefazione Se non sei uno sviluppat | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: [Open Source] CCQ: A Windows From Scratch Setup for Claude Code Environment to Ready-to-Use PowerShell Script | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Open Source] CCQ: Building a Claude Code environment from scratch on Windows to ready-to-use PowerShell scripts Content: CCQ (Claude Code Quickstart) is an automated installer for the Claude Code development environment that runs in Windows PowerShell. Introduction Are you still struggling because you're not a developer but want to experience the fun of vibe coding and don't know where to start? Are you still stuck trying to switch to Claude Code but don't know how to configure it, what MCPs to install, or which workflows to use, before ending up back at AIDE? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Open Source] CCQ: A Windows environment setup from scratch to an out-of-the-box PowerShell script for Claude Code Content: CCQ (Claude Code Quickstart) is an automated installer for the Claude Code development environment that runs in Windows PowerShell. Introduction Are you still struggling not knowing where to start because you're not a developer but want to experience the fun of vibe coding? Are you still stuck wanting to switch to Claude Code but don't know how to configure it, what MCPs to install, or which workflows to use, only to end up going back to AIDE? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Open Source] CCQ: A PowerShell Script to Build a Claude Code Environment from Scratch, Ready-to-Use | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [Open Source] CCQ: A Windows Zero-to-Deployment Environment for Claude Code, to Out-of-the-Box PowerShell Scripts Content: CCQ (Claude Code Quickstart) is an automation installer for a Claude Code development environment that runs in Windows PowerShell. Introduction Are you not a developer but want to experience the fun of vibe coding and don’t know where to start? Are you trying to switch to Claude Code but don’t know how to configure it, install what MCP, use what workflow, and end up going back to AIDE? | 通过 (无共识错误) |
| google/gemma-3-1b-it | CCQ (Claude Code Quickstart) is an automated installation tool for running Claude Code in Windows PowerShell. It allows you to quickly get started with coding and experience the vibe coding fun. |
严重
[准确性]
"CCQ (Claude Code Quickstart) is an automated installation tool for running Claude Code in Windows PowerShell. It allows you to quickly get started with coding and experience the vibe coding fun."
理由: Incomplete translation. The hypothesis summarizes the title and the first rhetorical question but completely omits the second rhetorical question regarding configuration, MCP, workflows, and returning to AIDE. | Missing details about the script's purpose. | Hypothesis provides only a partial translation. It omits the 'Foreword' section and the questions about non-developers and struggling with configuration, which are important parts of the source content. | Overall meaning is close, but it slightly shifts focus from "development environment" to just "running Claude Code" and adds "get started with coding" which is an interpretive addition not explicitly in the source segment. | Hypothesis is in English instead of Chinese (target language is zh). Also omits the entire content about the foreword and developer questions |
| google/gemma-3-4b-it | Title: [Open Source] CCQ: A Windows Environment for Building a Claude Code Environment from Scratch to Ready-to-Use PowerShell Scripts | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Open Source] CCQ: A guide to setting up a Claude Code environment from scratch on Windows, and using ready-to-use PowerShell scripts. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Open Source] CCQ: A Windows from scratch to ready-to-use PowerShell script for building a Claude Code environment | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'CCQ:一个 Windows 从零搭建 Claude Code 环境,到开箱即用的 Powershell 脚本' in en |
严重
[准确性]
"Translation of 'CCQ:一个 Windows 从零搭建 Claude Code 环境,到开箱即用的 Powershell 脚本' in en"
理由: Translation does not match the reference | Hypothesis is not a translation but a description of the task. The actual translation is missing. | The hypothesis is a meta-description string, not the actual translation of the source title. | No English translation is supplied; should cover Windows, setting up Claude Code from scratch, and an out-of-the-box PowerShell script. |
| tencent/HY-MT1.5-7B | "en": "Title: [Open Source] CCQ: A Windows-based Quickstart for Setting Up a Claude Code Environment, from Zero to Ready-to-Use Powershell Scripts", "zh-TW": "Title: [開源] CCQ:一個基於Windows的快速入門指南,從零開始建立Claude Code環境,直至可立即使用的Powershell腳本", "ja": "Title: [オープンソース] CCQ:Windows向けのクラウド・コード環境構築ガイド、ゼロから使えるPowershellスクリプトまで", "ko": "Title: [오픈소스] CCQ: Windows 기반의 클라우드 코드 환경 설정 가이드, 초기 설정부터 즉시 사용 가능한 Powershell 스크립트까지", "de":="Titel: [Open Source] CCQ: Eine schnelle Einführungshilfe für die Einrichtung einer Claude Code-Umgebung unter Windows, von der Grundlagen bis zu sofort einsatzfähigen Powershell-Skripten", "fr":="Titre: [Open Source] CCQ : Guide rapide pour l’installation d’une environnement Claude Code sous Windows, de la base à des scripts Powershell prêts à l’emploi", "it":="Titolo: [Open Source] CCQ: Guida rapida per l’installazione di un ambiente Claude Code su Windows, dalle basi all’utilizzo immediato di script Powershell"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Stable and Fast Google Gemini Public Site | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Stable and Fast Google Gemini Public Site |
轻微
[术语]
"Stable and Fast Google Gemini Public Site"
理由: Should be 'free API proxy site' not 'Public Site'. Reference uses more precise terminology consistent with the forum context. | Terminology: '公益站' should be 'free API proxy site' not 'Public Site' to maintain consistency with community terminology. | '公益站' in this context refers to a 'free API proxy site' maintained by the community, not a generic 'Public Site'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Stable and Fast Google Gemini Free Shared Site Content: Supported Models: models/gemini-2.0-flash gemini-2.0-flash-search Search Support models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation Image Generation Support... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Stable and Fast Google Gemini Public Proxy Content: Supported Models: models/gemini-2.0-flash gemini-2.0-flash-search Support Search: models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation Support Drawing... |
轻微
[术语]
"Public Proxy"
理由: '公益站' is consistently better translated as 'free API proxy site' in this dataset's context. | Inconsistent terminology: '公益站' means 'free API proxy site' not just 'public proxy'; reference uses 'free API proxy site' | “公益站” better rendered as “free API proxy site”; “Public Proxy” loses ‘free’ nuance. | The reference uses 'free API proxy site' which is more specific and accurate than 'Public Proxy'.
轻微
[流畅性]
"Support Drawing"
理由: Should be 'supports searching' as per reference. | The sentence structure is broken. It should be 'models/X supports search' and 'models/Y supports image generation'. The current output reads like a list of commands rather than feature descriptions. | The reference's 'supports image generation' is more natural than 'Support Drawing'. |
| Qwen/Qwen3-14B | Title: Stable Fast Google Gemini Free Shared API Site Content: Supported models: models/gemini-2.0-flash gemini-2.0-flash-search supports search models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation supports image generation... | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Stable and Fast Google Gemini Free Shared API Site Content: Supported Models: models/gemini-2.0-flash, models/gemini-2.0-flash-search; Supports Search: models/gemini-2.0-flash-thinking-exp-01-21, models/gemini-2.0-flash-exp-image-generation; Supports Drawing... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Stable Fast Google Gemini Free Shared API Site Supported Models: models/gemini-2.0-flash gemini-2.0-flash-search Supported Search models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation Supported Drawing... | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Stable and Fast Google Gemini Free Shared Site | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Stable and Fast Public Proxy for Google Gemini Content: Supported Models: models/gemini-2.0-flash gemini-2.0-flash-search Supports search models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation Supports image generation... |
轻微
[流畅性]
"Supports search models/gemini-2.0-flash-thinking-exp-01-21"
理由: Missing semicolons between items; reference has proper punctuation | Formatting and punctuation issues. Should use semicolons to separate model entries for clarity. | Missing punctuation between 'search' and list of models makes the sentence harder to parse compared with the reference. |
| google/gemma-3-1b-it | Support model: models/gemini-2.0-flash gemini-2.0-flash-search Support search models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation Support image generation... |
轻微
[流畅性]
"Support model: models/gemini-2.0-flash gemini-2.0-flash-search Support search models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation Support image generation..."
理由: The translation is overly literal and lacks natural phrasing (e.g., 'Support model' instead of 'Supported models'). It also fails to translate the title '稳定快速 Google Gemini 公益站'. | Missing colons and proper formatting. | The hypothesis lacks proper punctuation and formatting. It should use semicolons or line breaks to separate different models, as shown in the reference. | English phrasing is ungrammatical and lacks needed separators; should be "Supported models:" and clearly distinguish which models support search or image generation. |
| google/gemma-3-4b-it | Title: Stable and Fast Google Gemini Public Proxy Content: Supported Models: models/gemini-2.0-flash gemini-2.0-flash-search supports search models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation supports drawing... | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Stable and Fast Public Proxy for Google Gemini Content: Supported Models: models/gemini-2.0-flash gemini-2.0-flash-search Supports search models/gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-exp-image-generation Supports image generation... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Stable and Fast Google Gemini Public API Site |
轻微
[术语]
"Public API Site"
理由: Should be 'free API proxy site' not 'Public API Site'. Reference correctly uses 'free API proxy site'. | "公益站" in this context is more precisely "free API proxy site"; "Public API" can imply official public API, not a community proxy. | Similar to Item 6, 'Public' misses the 'free/charity' nuance of '公益'. 'Free API proxy site' is more accurate for the context. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 稳定快速 Google Gemini 公益站' in en |
严重
[准确性]
"Translation of 'Title: 稳定快速 Google Gemini 公益站' in en"
理由: Hypothesis is not a translation but a description of the task. The actual translation is missing. | The hypothesis is a meta-description string, not the actual translation of the source title. | Translation of 'Google Gemini 公益站' as 'Google Gemini free API proxy site' is more accurate | The hypothesis provides no English wording and omits “stable and fast Google Gemini free API proxy site” meaning. |
| tencent/HY-MT1.5-7B | Stable and Fast Google Gemini Public Site |
严重
[准确性]
"Public Site"
理由: '公益站' in this community means 'free API proxy site'; omitting 'API proxy' and 'free' changes the functional meaning. | The translation omits critical information present in the reference ('free API proxy'). | Mistranslation of slang '公益站' (free API proxy site) as literal 'Public Site'. | The source '公益站' means 'free/public welfare site' or 'free API proxy site', not just 'Public Site'. The hypothesis misses the key meaning of 'free' or 'proxy' which is essential to the original meaning. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: OpenClaw 大龙虾 official ps1 installation script |
轻微
[风格]
"大龙虾 official"
理由: Mixing Chinese nickname “大龙虾” with English “official” is slightly awkward; should be consistently in one language. | Inconsistent language mixing: '大龙虾' (Chinese nickname for OpenClaw) mixed with 'official' (English) - should be consistent. Also missing the '#' comment symbols from original. | Retains the Chinese nickname '大龙虾' (Big Lobster) for OpenClaw without translation or explanation, which is inappropriate for an English target audience. |
| CohereLabs/tiny-aya-water | Content: # OpenClaw's Windows installation script # Official recommended usage: # iwr -useb https://openclaw.ai/install.ps1 | iex | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Official OpenClaw Giant Lobster ps1 Installation Script Content: # OpenClaw Windows installation script # Recommended usage by official: # iwr -useb https://openclaw.ai/install.ps1 | iex | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: OpenClaw Crab Official ps1 Installation Script Content: # OpenClaw Windows installation script # Recommended usage by official: # iwr -useb https://openclaw.ai/install.ps1 | iex |
严重
[准确性]
"Crab"
理由: The reference correctly identifies this as 'The official ps1 installation script for OpenClaw' without the mistranslation 'Crab'. '大龙虾' is a nickname for OpenClaw, not literally 'big crab'. | “大龙虾” refers to lobster (OpenClaw), not crab; incorrect animal and branding. | Incorrect translation: '大龙虾' is slang for 'OpenClaw' but 'Crab' is wrong; should be 'OpenClaw' or 'OpenClaw (Lobster)' | Mistranslation of '大龙虾'. 'Lobster' is the nickname for OpenClaw. Translating it as 'Crab' is factually incorrect (wrong animal) and loses the specific community reference. |
| Qwen/Qwen3-14B | Script di installazione ufficiale OpenClaw Granchio PS1 | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Official OpenClaw PS1 installation script for Spiny Lobster Content: # OpenClaw Windows installation script # Recommended usage: # iwr -useb https://openclaw.ai/install.ps1 | iex | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Official PS1 Installation Script for OpenClaw Big Lobster Content: # OpenClaw Windows Installation Script # Recommended official usage: iwr -useb https://openclaw.ai/install.ps1 | iex | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: OpenClaw OpenDragonCrab Official PS1 Installation Script |
严重
[准确性]
"OpenClaw OpenDragonCrab Official PS1 Installation Script"
理由: Incorrectly translates '大龙虾' (big lobster/OpenClaw nickname) as 'OpenDragonCrab'; should just be 'OpenClaw' or 'the official ps1 installation script for OpenClaw' | The source '大龙虾' is a nickname for OpenClaw (Big Lobster). Translating it as 'OpenDragonCrab' is a hallucination and incorrect terminology. | Incorrect: '大龙虾' is slang for 'OpenClaw' not literal 'OpenDragonCrab'; '官方' means 'official' not 'OpenDragonCrab Official' |
| google/gemma-3-12b-it | Title: OpenClaw Lobster Official ps1 Installation Script Content: # OpenClaw Windows installation script # Recommended usage: # iwr -useb https://openclaw.ai/install.ps1 | iex | 通过 (无共识错误) |
| google/gemma-3-1b-it | OpenClaw Official Windows Installation Script |
严重
[准确性]
"OpenClaw Official Windows Installation Script"
理由: Incomplete translation. The hypothesis only translates the title and omits the content (the code comments and installation command). | Hypothesis is in English instead of Chinese (target language is zh). Also omits the content about the official usage command | The hypothesis translates '大龙虾' (big lobster/OpenClaw nickname) as 'Official' which is inaccurate. The reference correctly translates it as 'The official ps1 installation script for OpenClaw'. |
| google/gemma-3-4b-it | OpenClaw Lobster Official PS1 Installation Script | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Official OpenClaw Lobster PS1 Installation Script Content: # OpenClaw Windows installation script # Recommended usage: # iwr -useb https://openclaw.ai/install.ps1 | iex | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: OpenClaw Official ps1 Installation Script for Lobster |
严重
[准确性]
"for Lobster"
理由: The source '大龙虾' is a nickname for the software 'OpenClaw'. Translating it as 'for Lobster' implies the script is for the crustacean animal, which is a severe context error. | Adds an incorrect term; "大龙虾" is a playful nickname for OpenClaw itself, not a separate "Lobster" entity; reference title omits it for clarity. | '大龙虾' is a nickname for OpenClaw, not literally 'Lobster'. Should be 'The official ps1 installation script for OpenClaw' not 'for Lobster'. |
| tencent/HY-MT1.5-1.8B | Login Translation in English |
严重
[准确性]
"Login Translation in English"
理由: Hypothesis translates '登录' (login) which is not present in the source. The source is about an installation script, not login. | Translation of 'Login' is not required in the reference | The hypothesis is just a fragment and does not translate anything from the source; also unrelated to the given content about an installation script. | The hypothesis is completely unrelated to the source text (which is about an installation script) and appears to be a hallucinated label or error. |
| tencent/HY-MT1.5-7B | OpenClaw Official PS1 Installation Script for Big Lobster |
严重
[准确性]
"Big Lobster"
理由: Source '大龙虾' (big lobster) is informal slang for OpenClaw. Reference correctly omits this nickname in formal title, using just 'OpenClaw'. | Mistranslation of nickname '大龙虾' (Big Lobster/OpenClaw) as a literal animal rather than the software name. | '大龙虾' is a nickname for OpenClaw, not literally 'Big Lobster'; adding 'Big Lobster' implies a separate product or mascot not present in reference. | The translation adds 'for Big Lobster' which is not present in the source or reference. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Banana Pro is the real top, teaching you how to turn articles into cartoon infographics | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Banana Pro is the real top, teaching you how to turn articles into cartoon infographics | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Big Banana (Banana Pro) is really great, teaches you to turn articles into cartoon infographics Content: Found an extremely fun raw image prompt word that can transform articles into cartoon infographics; truly impressive understanding of Chinese and various elements | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Big Banana (Banana Pro) is really top-notch, teaching you to turn articles into cartoon infographics Content: Found a really fun image prompt that can transform articles into cartoon infographics, and the Chinese and various elements are truly impressive | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Grande Banana (Banana Pro) è davvero eccezionale, ti insegna qui a trasformare gli articoli in infografiche a fumetti | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Big Banana (Banana Pro) is truly awesome. Learn how to turn articles into cartoon infographics. |
致命
[准确性]
"Content omitted"
理由: The hypothesis completely omits the translation of the source Content paragraph. | Content field is completely missing in hypothesis | The hypothesis only translates the title and omits the entire body describing the image generation prompt and its capabilities, losing essential information. |
| Qwen/Qwen3.5-9B | Title: Big Banana (Banana Pro) is truly top-notch; here's how to turn articles into cartoon infographics Content: Discovered a very fun image generation prompt that can transform articles into cartoon infographics. The Chinese understanding and element recognition are incredibly impressive. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Banana Pro (Banana) is really cool, teaches you to turn articles into cartoon information graphics. |
轻微
[风格]
"Banana Pro (Banana) is really cool,"
理由: Slightly different wording from reference but meaning is conveyed | The source '大香蕉(Banana Pro)' uses 'Big Banana' as a nickname. The hypothesis redundantly says 'Banana Pro (Banana)', losing the 'Big' nuance and sounding repetitive. | “真的顶” is strong slang (“really awesome” / “top-tier”); “really cool” slightly weakens the tone but is acceptable. |
| google/gemma-3-12b-it | Title: Da Xiang Jiao (Banana Pro) is truly awesome; learn how to turn articles into cartoon infographics Content: I discovered an amazing image generation prompt that can transform articles into cartoon infographics. The Chinese and element understanding is incredibly impressive. |
轻微
[风格]
"Da Xiang Jiao (Banana Pro) is truly awesome; learn how to"
理由: Pinyin 'Da Xiang Jiao' retained vs reference's 'Banana Pro' only | Translating the nickname '大香蕉' phonetically as 'Da Xiang Jiao' is unnecessary when the English equivalent 'Banana Pro' is provided and understood. It disrupts the flow. | Pinyin romanization unnecessary. Should be 'Banana Pro' directly. | Adds 'learn how to' instructional nuance not explicit in the source title, though overall intent is similar. |
| google/gemma-3-1b-it | Translation of '大香蕉' with LinuxDo community context |
轻微
[准确性]
"Translation of '大香蕉' with LinuxDo community context"
理由: No actual translation is given; it only describes the task and omits rendering "大香蕉" (Banana Pro) in English within the community context. | The hypothesis is a meta-instruction describing a translation task rather than the actual translated content. It fails to translate the source text. | Hypothesis only states what is being translated rather than providing the actual translation. The source mentions 'Banana Pro' but the hypothesis does not provide a proper translation. | Hypothesis is incomplete and unclear - it only mentions '大香蕉' but the source is about 'Banana Pro' and the content about image generation prompts |
| google/gemma-3-4b-it | Title: Big Banana (Banana Pro) is really great, teaching you how to turn articles into cartoon infographics Content: I discovered a very fun live prompt, this prompt can turn articles into cartoon infographics, the Chinese understanding and various elements are truly amazing. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Da Xiang Jiao (Banana Pro) is truly excellent; learn how to turn articles into cartoon infographics. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Banana Pro is really great, teaching you how to turn articles into cartoon infographics | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '大香蕉' with LinuxDo community context", "zh-TW": "LinuxDo社區中的‘大香蕉’翻譯”, "ja": "LinuxDoコミュニティにおける“大きなバナナ”の翻訳”, "ko": "LinuxDo 커뮤니티의 ‘큰 바나나’ 번역”, "de": "LinuxDo Gemeinschaftskontext von ‘Großer Banane’", "fr": "Traduction de ‘Grand Banane’ dans le contexte de la communauté LinuxDo", "it": "Traduzione di ‘Grande Banana’ nel contesto della comunità LinuxDo" } |
严重
[准确性]
""en": "Translation of '大香蕉' with LinuxDo community context""
理由: No concrete translation of “大香蕉” is given; in this context it refers to the product name “Banana Pro”, not literally “big banana”. | The hypothesis outputs a JSON object with meta-descriptions for multiple languages instead of the actual English translation of the source text. | Translation of '大香蕉' as 'Banana Pro' is required | Hypothesis is not a proper translation but a description of the task. The actual translation is missing. Also '大香蕉' (big banana) should be translated as 'Banana Pro' based on the reference. |
| tencent/HY-MT1.5-7B | "en": "Title: The Banana Pro is truly outstanding – it teaches you how to turn articles into cartoon infographics.", "zh-TW": "Title: Banana Pro真的非常出色——它教会你如何将文章转换成卡通信息图!", "ja": "Title: Banana Proは本当に優れています——それは記事をカートゥーンインフォグラフィックに変換する方法を教えてくれます。", "ko": "Title: Banana Pro는 정말 뛰어납니다——그것은 기사를 카툰 인포그래픽으로 변환하는 방법을 가르쳐줍니다。", "de": "Titel: Banana Pro ist wirklich hervorragend – es zeigt Ihnen, wie Sie Artikel in Cartoon-Infografiken umwandeln können。」, "fr": "Titre: Banana Pro est vraiment exceptionnel – il vous apprend comment transformer des articles en infographies animées。」, "it": "Titolo: Banana Pro è davvero eccezionale – vi insegna come trasformare gli articoli in infografiche animate。"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Since the cc channel is gone, my work efficiency has dropped significantly. |
严重
[准确性]
"Since the cc channel is gone"
理由: Misinterpretation of 'Claude Code channels' as 'cc channel'. | Incomplete translation. Should be 'Ever since the Claude Code channels are gone' to match reference and capture full meaning. | Mistranslation: 'cc渠道' refers to 'Claude Code channels' not just 'cc channel'. The translation is too literal and loses the specific reference to Claude Code. |
| CohereLabs/tiny-aya-water | Since the cc channel is gone, my work efficiency has dropped sharply. |
严重
[准确性]
"Since the cc channel is gone"
理由: Mistranslation: 'cc渠道' refers to 'Claude Code channels' not just 'cc channel'. 'codex' should be capitalized. The reference mentions 'kiro and other channels' which is missing. | Incomplete translation - only translates title, not content. Also 'cc channel' should be 'Claude Code channels' for clarity. | Source refers specifically to Claude Code and other channels like kiro; “cc” alone is ambiguous and omits mention of those channels. |
| Qwen/Qwen2.5-14B-Instruct | Title: Since the cc channel is gone, productivity has dropped sharply Content: Since channels like kiro have disappeared, everyone's public stations for Claude are no longer available; after switching to Codex, productivity has dropped sharply. I see that Codex solves problems at an incredibly fast speed. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Since the cc channel disappeared, work efficiency has dropped sharply. |
轻微
[准确性]
"Since the cc channel disappeared, work efficiency has dropped sharply."
理由: Missing title translation: source has title '自从cc渠道没了后,工作效率直线下降' which should be translated | Condenses both title and content, omits references to kiro, free sites, switch to Codex, and ironic tone about speed; key details lost. | The hypothesis is missing the title entirely. The source has a title that should be translated. The reference provides the complete translation with title. | Severe omission. The hypothesis only translates the title and misses the entire content paragraph which explains the context (kiro channels down, switch to Codex, sarcasm about speed). |
| Qwen/Qwen3-14B | Da che il canale cc è sparito, l'efficienza del lavoro è crollata | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Since the cc channel is gone, productivity has plummeted. |
致命
[准确性]
"Since the cc channel is gone, productivity has plummeted."
理由: Only the title is translated and the detailed content about loss of Claude, use of Codex, and sarcastic comment on Codex speed is entirely omitted. | Hypothesis is missing the title and only provides a partial translation. Should include full title and content. | Ambiguous. 'cc' refers to 'Claude Code'. Expanding it improves clarity. |
| Qwen/Qwen3.5-9B | Title: Since the cc channel is gone, work efficiency has plummeted straight down Content: Since channels like kiro have disappeared, everyone's public service stations no longer have claude; after switching to codex, work efficiency has dropped significantly. I see that codex solves problems really "fast" | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Since cc channels have been removed, productivity has dropped significantly. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Work efficiency has dropped sharply since the cc channel disappeared Content: Since channels like kiro disappeared, everyone's public service stations have lost Claude and switched to Codex, resulting in a sharp drop in work efficiency. I think Codex is truly "fast" when solving problems. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Since the kiro and other channels have been discontinued, everyone's charity platform has disappeared, and Codex was introduced after that, work efficiency dropped sharply, I think Codex is really fast. |
严重
[准确性]
"everyone's charity platform has disappeared"
理由: Hypothesis is in English instead of Chinese (target language is zh). Also 'cc渠道' should be 'Claude Code channels' not just 'channels', and '公益站' should be 'free API proxy sites' not 'charity platform' | "公益站" refers to free API proxy sites, not "charity platform"; this mistranslates a key community term. | Mistranslation of '公益站'. In this context, it refers to 'free API proxy sites' maintained by the community, not a 'charity platform' that has 'disappeared'. The source says the Claude access on these sites is gone, not the sites themselves. | The hypothesis misses the sarcasm in the original. The phrase '解决问题速度是真"快"' with quotes around '快' indicates sarcasm - Codex is slow, not fast. The reference correctly captures this with 'so fast' in quotes. |
| google/gemma-3-4b-it | Since the cc channel disappeared, work efficiency has decreased sharply. After kiro and other channels disappeared, many seniors’ public stations were gone, replaced by Claude, and then work efficiency dropped sharply. I find that Codex solves problems very quickly. | 通过 (无共识错误) |
| google/translategemma-12b-it | Since the cc channel was discontinued, work efficiency has dropped significantly. |
严重
[准确性]
"Since the cc channel was discontinued, work efficiency has dropped significantly."
理由: The hypothesis only translates the title (and simplifies it) and completely omits the content which explains the shift from Claude to Codex and the sarcasm regarding Codex's speed. | The hypothesis only translates the title but omits the content. The source includes detailed explanation about the efficiency drop that should be translated. | Source explicitly mentions kiro and other channels going down, loss of Claude on free sites, and switch to Codex with ironic comment on speed. Hypothesis compresses to only "cc channel" and omits Codex comparison, reducing key contextual information. |
| google/translategemma-4b-it | Since the cc channel disappeared, work efficiency has decreased significantly. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '自从cc渠道没了后,工作效率直线下降' in en", "zh-TW": "自從kiro等渠道消失了之後,各位佬的公益站也沒有了claude,改為codex之後,工作效率直線下降,我覺得codex解決問題的速度真是‘快’”, "ja": "自從ccチャンネルがなくなってから、効率の良い作業が急激に下がった。Claudeもなくなり、Codexに変わってからは、仕事の効率が急落した。Codexが問題を解決する速度は本当に「速い」と感じる", "ko": "CC 채널이 사라진 이후로 효율적인 작업이 급격히 떨어졌습니다. Claude도 없어지고 Codex로 바뀌자 업무 효율이 급격히 낮아졌습니다. Codex가 문제를 해결하는 속도는 정말 '빠르다'고 느낍니다.", "de": "Seit die CC-Kanäle verschwunden sind, hat sich die Effizienz der Arbeit stark verringert. Claude ist auch weg, und nachdem es auf Codex umgestellt wurde, hat sich die Arbeitsproduktivität rapide verringert. Ich finde, dass Codex die Probleme schnell löst", "fr": "CC 채널이 사라진 이후로 효율성은 크게 떨어졌습니다. Claude도 없어지고 Codex로 전환한 후에는 업무 효율이 급격히 감소했습니다. Codex가 문제를 빠르게 해결한다고 생각합니다", "it": "Desde que os canais do CC desapareceram, a eficiência do trabalho diminuiu drasticamente. Com o Claude fora de cena e com a mudança para o Codex, a produtividade do trabalho caiu significativamente. Acho que o Codex resolve problemas rapidamente."}, "input": "标题:自从cc渠道没了后,工作效率直线下降" {"en": "Translation of '自从cc渠道没了后,工作效率直线下降' in en", "zh-TW": "自從kiro等渠道消失了之後,各位佬的公益站也没了claude,改為codex後,工作效率直線下降,我覺得codex解决问题速度是真“快”", "ja": "自從kiroなどのチャンネルがなくなった後、皆さんの公益サイトにもClaudeがなくなり、Codexに変更された後、効率の良い作業が急激に下がりました。Codexが問題を解決する速度は本当に「速い」と感じます", "ko": "Kiro 등의 채널이 사라진 이후에 여러분들의 공익 사이트에도 Claude가 사라지고 Codex로 변경된 후에 효율적인 작업이 급격히 떨어졌습니다. Codex가 문제를 해결하는 속도는 정말 '빠르다'고 느낍니다.", "de": "Nachdem Kiro und andere Kanäle verschwunden waren, hatte auch das öffentliche Projekt-Forum keine Claude mehr. Nach dem Wechsel zu Codex sank die Effizienz der Arbeit dramatisch. Ich finde, dass Codex die Probleme schnell löst", "fr": "Kiro 및 기타 채널이 사라진 후에 여러분의 공익 사이트에도 Claude가 사라졌고, Codex로 전환한 후에 효율성이 급격히 떨어졌습니다. Codex가 문제를 해결하는 속도는 정말 '빠릅니다'", "it": "Dopo che i canali di Kiro e altri sono scomparsi, anche il forum pubblico non aveva più Claude. Dopo la transizione a Codex, l'efficienza del lavoro è diminuita drasticamente. Penso che Codex risolva i problemi velocemente."} |
严重
[准确性]
""en": "Translation of '自从cc渠道没了后,工作效率直线下降' in en""
理由: The hypothesis outputs a complex JSON structure with meta-descriptions and translations in multiple languages, failing to provide a clean English translation of the source text. | Hypothesis is not a translation but a description of the task. The actual translation is missing. | Translation of 'cc渠道' as 'Claude Code channels' is more accurate | The English string is only a meta-description and does not actually translate the sentence. |
| tencent/HY-MT1.5-7B | Since the cc channel was discontinued, work efficiency has dropped significantly. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: I'm working on a large project that requires AI to organize a 200GB data library... The numbers I've downloaded from the station and my own registered numbers are struggling to keep up, and success rates are constantly dropping at night. CC is no longer running, and I'm asking kind-hearted people to share their base_url and API key. |
严重
[准确性]
"The numbers I've downloaded from the station and my own registered numbers"
理由: Missing title entirely: The source has a title '【谢谢,目前已足够】有没有人有好用的Codex公益站都给我😭' which is completely missing in the hypothesis. The reference includes the full title. | Misinterprets “号” as numbers instead of “accounts”, leading to semantic error. | Mistranslation of '号' (accounts). 'Numbers' refers to digits, whereas 'accounts' is the correct meaning in this context. |
| CohereLabs/tiny-aya-water | Content: I'm working on a large project, using AI to organize a 200GB data set... The numbers I've downloaded from the site and registered accounts just aren't holding up, success rates are dropping at night, can't run cc anymore, looking for someone with a base_url and API key, thanks. |
严重
[准确性]
"the numbers I've downloaded from the site"
理由: Mistranslation of '号' (accounts). 'Numbers' is incorrect; it should be 'accounts'. | Mistranslation: '号' means 'accounts' not 'numbers'. 'cc跑不动' means 'Claude Code can't run' not 'can't run cc'. The reference mentions 'Codex free API proxy sites' but hypothesis doesn't clarify this context. | “号” means accounts, not numbers; this affects understanding of resource limits. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Thank you, currently enough] Is there anyone with a good Codex free shared API site? Please give me 😭 Content: I am working on a large project and need AI to organize a 200GB database... The accounts downloaded from the site and those registered by myself are not sufficient, success rate is dropping at night, cc can't run anymore, please leave base_url and API key if you're kind enough, thank you. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Thanks, It's Already Enough] Does Anyone Have a Reliable Codex Public Proxy? 😭 Content: I am working on a large project to整理 an 200GB database using AI... The accounts from the station and those I registered myself can't keep up, and the success rate has been decreasing at night. The cc is not running anymore, so if anyone could leave their base_url and API key, that would be greatly appreciated. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: [Thank you, it's already enough] Does anyone have a good Codex public proxy? Please give me 😭 Content: I'm working on a big project that requires using AI to organize a 200GB database... The accounts downloaded from the site and the ones I registered myself are struggling to support it. The success rate at night has been declining, cc can't run anymore. Please kind people leave base_url and API key, thank you. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Thanks, currently sufficient] Does anyone have a good Codex free API site? 😭 Content: I'm working on a large project and need to organize a 200GB database using AI... The accounts downloaded from the site plus my own registered ones aren't enough; success rates keep dropping at night, and CC can't handle it anymore. Please leave your base_url and API key if you're kind enough. Thanks. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Thank you, currently sufficient] Does anyone have any useful Codex free shared API sites? Please share them with me 😭 Content: I'm working on a large-scale project and need to use AI to organize a 200GB database... The numbers downloaded from the site and those registered myself are insufficient. Success rates drop at night, CC can't run anymore. Kindly leave your base_url and API key. Thanks. |
轻微
[术语]
"Codex free shared API sites"
理由: Should more precisely be “Codex free API proxy sites” to align with domain term. | Should be 'free API proxy sites' to match reference terminology | 'Free API proxy sites' is the preferred term. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: "Thank you, that's enough for now" Have you anyone with a usable Codex public proxy? Please let me know 😭 | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [Thank you, that’s enough for now] Does anyone have any good Codex public proxy sites to share? 😭 Content: I’m working on a large project and need to use AI to organize a 200GB library… The accounts downloaded from the sites and those I registered myself are struggling to keep up, and the success rate has been declining at night. CC is not running anymore. Please kind souls leave your base_url and API key. Thank you. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Thank you for the information about the Codex公益站. I am working on a large project that needs to download a 200GB dataset...the number of downloads from the platform is not enough to support me, and the registration numbers keep dropping, the cc server is no longer running, please leave the base_url and API key for those who can help, thank you. |
严重
[准确性]
"Thank you for the information about the Codex公益站."
理由: The hypothesis misinterprets the source. It incorrectly states 'Thank you for the information' when the source is asking for help. The title indicates '[Thanks, I have enough now]' meaning the request has been fulfilled, not that they're thanking for information. | Hypothesis is in English instead of Chinese (target language is zh). Also '公益站' is not translated properly and the meaning is altered - 'cc' should be 'Claude Code' | Mistranslation of the title. The source title '【谢谢,目前已足够】' means 'Thanks, I have enough now' (indicating the request is closed), whereas the hypothesis interprets it as thanking someone for information. Also, '公益站' is left untranslated. | Title meaning is "Thanks, I have enough now" and asking for Codex free API proxy sites; hypothesis turns it into thanks for information and leaves "公益站" untranslated. |
| google/gemma-3-4b-it | "en": "Title: [Thank you, currently enough] Does anyone have a useful Codex public proxy I can use?😭\nContent: I am working on a large project to organize a 200GB database using AI... The accounts downloaded from the site and my own registered accounts cannot support it, and the success rate has been decreasing at night. CC is not running, please ask kind people to leave base_url and API key, thank you.", "zh-TW": "標題:[謝謝,目前已足夠] 有沒有人有實用的Codex公益站可以給我用嗎😭\n內容:我正在進行一個大型工程,要用AI整理一個200GB的資料庫…從站裡下載的號還有自己註冊的號難以支撐,晚上成功率一直在下降,CC跑不動了,求好心人留下base_url和API key,謝謝。", "ja": "タイトル:[ありがとうございます、現在は十分です] 誰か役に立つCodex公益站を教えてください😭\nコンテンツ:大規模なプロジェクトでAIを使って200GBのデータベースを整理しています…。サイトからダウンロードしたアカウントと自分の登録したアカウントでは不十分で、夜間での成功率は低下し続けています。CCが動かず、優しい方にお願いします。base_urlとAPIキーを残してください。ありがとうございます。", "ko": "제목: [감사합니다, 현재 충분합니다] 유용한 Codex 공용 프록시를 공유해 주실 분 있나요😭\n내용: 저는 대규모 프로젝트에서 AI를 사용하여 200GB 데이터베이스를 정리하고 있습니다...사이트에서 다운로드한 계정과 제 계정으로는 부족하여 성공률이 밤에 계속 떨어지고 있으며, CC가 작동하지 않습니다. 친절한 분들께 base_url과 API 키를 남겨주시면 감사하겠습니다.", "de": "Titel: [Vielen Dank, derzeit ist es ausreichend] Hat jemand einen nützlichen Codex Proxy-Server für mich?\nInhalt: Ich arbeite an einem großen Projekt, um mit KI eine 200 GB große Datenbank zu organisieren... Die von der Seite heruntergeladenen Accounts und meine eigenen registrierten Accounts reichen nicht aus, und die Erfolgsrate sinkt nachts kontinuierlich. Der CC läuft nicht, bitte hinterlassen Sie mir base_url und API Key, danke.", "fr": "Titre : [Merci, il y en a suffisamment pour le moment] Quelqu’un aurait-il un Codex Proxy utile à partager avec moi ?😭\nContenu : Je travaille sur un grand projet pour organiser une base de données de 200 Go à l’aide de l’IA... Les comptes téléchargés depuis la plateforme et mes propres comptes ne suffisent pas, et le taux de réussite diminue constamment la nuit. Le CC ne fonctionne pas, veuillez laisser votre base\_url et votre clé API, merci.", "it": "Titolo: [Grazie, al momento è sufficiente] Qualcuno ha un Codex Proxy utile da condividere con me😭\nContenuto: Sto lavorando a un grande progetto per organizzare una base di dati di 200 GB utilizzando l'intelligenza artificiale... Gli account scaricati dal sito e i miei account registrati non sono sufficienti e il tasso di successo diminuisce costantemente di notte. Il CC non funziona, per favore lascia base_url e chiave API, grazie." } <|im_end|> | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Thank you, currently sufficient] Does anyone have any good Codex public API sites they can share with me? Content: I'm working on a large project that requires using AI to organize a 200GB database... The accounts downloaded from the sites and the ones I registered myself are not enough to handle the load, and the success rate has been decreasing at night. I can't run CC anymore. Please kind souls leave your base_url and API key. Thank you. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: 【Thank you, currently enough】Does anyone have a useful Codex free shared API site to share? 😭 Content: I am working on a large project to use AI to organize a 200GB data library... The accounts downloaded from the site and my own registered accounts cannot support it, and the success rate has been decreasing at night. CC is not running, please ask kind people to leave base_url and API key, thank you. |
轻微
[术语]
"Free Shared API Site"
理由: "公益站" better rendered as "free API proxy site"; "shared API" is vague. | Should be 'free API proxy site' not 'free shared API site'. Reference correctly uses 'free API proxy sites'. | 'Free API proxy site' is a more precise translation for '公益站' in this technical context. |
| tencent/HY-MT1.5-1.8B | Translation of '谢谢,目前已足够' in en |
严重
[准确性]
"Translation of '谢谢,目前已足够' in en"
理由: Hypothesis translates only part of the title ('谢谢,目前已足够') but misses the first part '有没有人有好用的Codex公益站都给我😭'. The reference shows the full title should be translated. | The hypothesis is a meta-description string, not the actual translation of the source title. | No English phrase is given; should express thanks and that the amount is now sufficient (e.g. “Thanks, I have enough now”). | Translation of '谢谢,目前已足够' as 'Thanks, I have enough now' is more accurate |
| tencent/HY-MT1.5-7B | Title: [Thank you, it’s already enough] Does anyone have a useful Codex Public Welfare Site to share with me? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: A harsh critique on Dragon Crab Content: What do you think about the harsh critiques on Dragon Crab online? |
严重
[准确性]
"Dragon Crab"
理由: Misinterpretation of 'OpenClaw' as 'Dragon Crab'. | 'Dragon Crab' is incorrect. Should be 'Outrageous opinions about OpenClaw'. '龙虾' is a nickname for OpenClaw, not 'Dragon Crab'. | Misinterprets “龙虾” (lobster, nickname for OpenClaw) as “Dragon Crab”, changing the referenced product. | Complete hallucination/mistranslation. '龙虾' (Lobster) is the nickname for 'OpenClaw'. 'Dragon Crab' is incorrect and confusing. | Mistranslation: '龙虾' is the nickname for 'OpenClaw' (a tool), not 'Dragon Crab'. The translation uses a literal translation that doesn't convey the actual meaning. |
| CohereLabs/tiny-aya-water | Title: About the Rant on Crabs |
致命
[准确性]
"Crabs"
理由: “龙虾” here is a nickname for OpenClaw, not generic crabs; completely wrong referent. | The translation incorrectly translates '龙虾' (lobster) as 'Crabs' and completely omits the 'Content' section of the source text, failing to convey the question asked to the community. | Mistranslation: '龙虾' is the community nickname for 'OpenClaw', not 'Crabs'. '暴论' means 'outrageous opinions' not 'Rant'. | Completely wrong translation. '龙虾' (lobster/OpenClaw) mistranslated as 'Crabs'. Should be 'Outrageous opinions about OpenClaw'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Controversial Opinions on Lobsters Content: Seeing controversial opinions about lobsters online, what do friends think? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Rant about Lobster Content: What do old friends think about the rant on lobster online? |
严重
[准确性]
"Lobster"
理由: The hypothesis mistranslates '龙虾' as 'Lobster'. The reference correctly identifies it as 'OpenClaw', which is the actual project name. | Inconsistent terminology: '龙虾' is slang for 'OpenClaw' but 'Lobster' is the literal translation; reference uses 'OpenClaw' | Mistranslation of '暴论' (outrageous opinions/hot takes) as 'Rant', losing the specific nuance of controversial statements. | “龙虾” here is the nickname for OpenClaw, not generic lobster; should retain product/app reference. |
| Qwen/Qwen3-14B | Un acceso commento sui granchi | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Hyperbolic discussion on Lobster Content: Seeing hyperbolic discussions about lobster online, what do you think? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Absurd theories about lobsters Content: Seeing the absurd theories online about lobsters, what do you think, fellow LinuxDo users? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | About the Rant on Lobsters |
严重
[准确性]
"About the Rant on Lobsters"
理由: Should be 'Outrageous opinions about OpenClaw'; '龙虾' is a nickname for OpenClaw, not literal lobsters | '龙虾' here refers to the software 'OpenClaw', not the animal 'Lobsters'. '暴论' means 'outrageous opinion' or 'controversial take', not necessarily a 'rant'. The translation completely misinterprets the subject. | “龙虾” is a nickname for OpenClaw, not actual lobsters; hypothesis loses the OpenClaw reference and interprets literally. | Incorrect: '龙虾' is slang for 'OpenClaw' not literal 'lobsters'; '暴论' means 'outrageous/extreme opinions' not 'rant' |
| google/gemma-3-12b-it | Title: A Radical Treatise on Lobster Content: Having seen a radical treatise on lobster online, what do you guys think? |
严重
[准确性]
"Lobster"
理由: Mistranslation. '龙虾' (Lobster) is a nickname for OpenClaw. Should be 'Outrageous opinions about OpenClaw'. | 龙虾 here refers to OpenClaw, a product, not literal lobster; translating literally loses the intended referent. | '暴论' means 'outrageous opinion' or 'hot take'. 'Treatise' implies a formal written work, which is incorrect. 'Lobster' should be translated as 'OpenClaw' or kept as contextually understood, but 'Treatise on Lobster' sounds like a biological study. | 'Lobster' is a nickname for OpenClaw but reference uses 'OpenClaw'; 'Radical Treatise' vs 'Outrageous opinions' - different meaning |
| google/gemma-3-1b-it | Regarding the discussion about shrimp, how do you see it? |
严重
[准确性]
"shrimp"
理由: The hypothesis fails to recognize that '龙虾' (lobster) is a nickname for OpenClaw in this community context. It literally translates as 'shrimp' which is incorrect. The reference correctly identifies it as 'OpenClaw'. | Here "龙虾" is a nickname for OpenClaw, not literal shrimp; translation loses the product reference and community-specific meaning. | Hypothesis incorrectly translates '龙虾' (OpenClaw) as 'shrimp' - this is a well-known term in the community referring to OpenClaw. Also hypothesis is in English instead of Chinese (target language is zh) |
| google/gemma-3-4b-it | Title: Absurd Arguments About Lobster Content: What do you think about the absurd arguments online about lobster? |
严重
[准确性]
"Lobster"
理由: 'Lobster' is slang for OpenClaw; should be 'Outrageous opinions about OpenClaw' per reference | '龙虾' here refers to the software 'OpenClaw', not the animal 'Lobster'. Translating it as 'Lobster' loses the specific reference. | Source “龙虾” here refers to OpenClaw (nickname), but reference clarifies context; title should mention OpenClaw explicitly to avoid generic lobster meaning. | The term 'lobster' does not accurately translate the source's context, which refers to 'OpenClaw'. |
| google/translategemma-12b-it | Title: A Heated Debate About Lobsters Content: I saw a heated debate online about lobsters. What do you guys think? |
严重
[准确性]
"Lobsters"
理由: The hypothesis misses the context that '龙虾' (lobster) is a nickname for OpenClaw. Should be 'Outrageous opinions about OpenClaw' not 'A Heated Debate About Lobsters' | The hypothesis mistranslates '龙虾' (Lobster, the nickname for OpenClaw software) as actual lobsters (the animal). '暴论' (outrageous opinions/shocking statements) is weakened to 'Heated Debate'. The content is also completely omitted. | "龙虾" here refers to the OpenClaw product/community, not literal lobsters; translation to animal loses specific software reference. |
| google/translategemma-4b-it | Title: Absurd Claims about Lobsters Content: What do you guys think about the absurd claims about lobsters I saw online? |
严重
[准确性]
"Lobsters"
理由: Major error. '龙虾' (Lobster) is the nickname for the software 'OpenClaw'. The translation suggests the text is about the animal, completely missing the context. | '龙虾' is a nickname for OpenClaw, not literally 'Lobsters'. Should be 'Outrageous opinions about OpenClaw' not 'Absurd Claims about Lobsters'. | "龙虾" in this community context refers to OpenClaw, not literal lobsters; mistranslation changes topic completely. |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾' in en |
严重
[准确性]
"Translation of '龙虾' in en"
理由: The hypothesis does not supply an English equivalent; here “龙虾” refers to the product “OpenClaw”, not a literal lobster. | The hypothesis is a meta-description string, not the actual translation of the term. | Hypothesis translates '龙虾' (lobster) literally, but in this context it refers to 'OpenClaw' (a software tool). The reference correctly translates it as OpenClaw. |
| tencent/HY-MT1.5-7B | Title: Radical Opinions about Lobsters | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Let's talk about GitCortex and OMO, CCW, CCG core differences | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Let's talk about GitCortex and OMO, CCW, CCG core differences | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: "Open Source" Explain the core differences between GitCortex and OMO, CCW, CCG Content: Without understanding what this project is about and how it differs from existing mature solutions, whether it's just reinventing the wheel again? So I'm starting a new thread to explain. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: ‘Open Source’ Explain the Core Differences Between GitCortex and OMO, CCW, CCG Content: Haven’t figured out what this project is for, where its differences lie compared to existing mature solutions, or if it’s just another wheel-reinvention. So I’m starting a new thread to explain. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: Open Source, let's talk about the core differences between GitCortex and OMO, CCW, CCG | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Let's discuss GitCortex and OMO, CCW, CCG core differences Content: I haven't figured out what this project is actually about, where its difference lies from existing mature solutions, or if it belongs to repeated reinvention of the wheel? So I'm opening a new thread to explain it. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: What is the core difference between GitCortex and OMO, CCW, CCG? Content: Not clear what this project does or how it differs from existing mature solutions; is it just reinventing the wheel? So I'm posting separately to explain. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: "Open Source" Discussing GitCortex vs OMO, CCW, CCG Core Differences |
轻微
[风格]
""Open Source""
理由: Reference adds clarifications in parentheses (CCW = Claude Code Web, CCG = Claude Code Workflow) which provides helpful context | Should use brackets '[Open Source]' to match reference format | Uses quotation marks instead of brackets; minor formatting/style deviation from typical tag-like usage. | The source uses brackets【开源】which acts as a tag. Using quotes changes the meaning slightly, brackets are preferred for tags. |
| google/gemma-3-12b-it | Title: "Open Source" - Let's Discuss the Core Differences Between GitCortex and OMO, CCW, and CCG Content: I haven't figured out what this project actually does, or how it differs from existing mature solutions, or whether it's just reinventing the wheel repeatedly. So I'm starting a separate post to explain. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of "开源" in en |
轻微
[准确性]
"Translation of "开源" in en"
理由: Hypothesis only states what is being translated rather than providing the actual translation. This is incomplete and does not evaluate the translation quality. | The hypothesis is a meta-instruction describing a translation task rather than the actual translated content. It fails to translate the source text. | No English translation is provided; only a meta description. "开源" should be rendered as "Open Source" in this context. | Hypothesis is a literal translation placeholder instead of actual translation of the source content |
| google/gemma-3-4b-it | Title: "Open Source" Let's talk about GitCortex and OMO, CCW, and CCG core differences. What is the difference between them? |
轻微
[流畅性]
""Open Source" Let's talk about GitCortex and OMO, CCW, and CCG core differences. What is the difference between them?"
理由: Repetitive and slightly clunky. The reference expands the acronyms which is helpful. | Redundant and awkward; reference provides clearer structure with abbreviation explanations | The sentence is redundant and less fluent compared to the reference. |
| google/translategemma-12b-it | Title: "Open Source" - Let's talk about the core differences between GitCortex and OMO, CCW, and CCG. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: "Open Source" - Explain the core differences between GitCortex and OMO, CCW, and CCG. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '开源' in en |
严重
[准确性]
"Translation of '开源' in en"
理由: The hypothesis is a meta-description string, not the actual translation of the term. | Hypothesis translates only '开源' (open source) but misses the rest of the title '讲一下GitCortex与OMO,CCW,CCG的核心区别是什么'. The reference shows the full title should be translated. | No actual English word is provided; should be “open source” or bracketed “[Open Source]” in this title context. | Translation of '开源' as 'Open Source' is correct |
| tencent/HY-MT1.5-7B | "en": "Title: Discussing GitCortex and OMO, CCW, CCG: What are the key differences?", "zh-TW": "标题:探讨GitCortex与OMO、CCW、CCG的主要差异", "ja": "タイトル:GitCortexとOMO、CCW、CCGの主な違いについて", "ko": "제목: GitCortex와 OMO, CCW, CCG의 핵심 차이점은 무엇인가", "de":="Titel: Was ist der Hauptunterschied zwischen GitCortex und OMO, CCW, CCG?", "fr":="Titre: Discussion sur GitCortex et OMO, CCW, CCG : Quelles sont les principales différences ?", "it":="Titolo: Discutiamo di GitCortex e OMO, CCW, CCG: Quali sono le principali differenze?"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: GGBOOM's Guide on Using sub2api Public Proxy |
轻微
[术语]
"GGBOOM's Guide on Using sub2api Public Proxy"
理由: Again, 'free API proxy site' is the precise term for '公益站' in this context. | Terminology: '公益站' should be 'free API proxy site' not 'Public Proxy'. Also '食用说明' means 'usage guide/instructions' not just 'Guide'. | Should be 'Usage instructions for the GGBOOM free API proxy site' to match reference terminology. |
| CohereLabs/tiny-aya-water | Title: GGBOOM Public Proxy Station Usage Instructions | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Usage Instructions for the GGBOOM Free Shared API Site Content: Many people don't understand how to use openai calls on sub2api type free shared sites. Honestly, I didn't get it at first either; I had to read the source code to figure it out. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: How to Use GGBOOM Public Proxy Content: Many people don't understand how to call openai with sub2api type public proxy. To be honest, I didn't get it either at first; I had to read the source code with AI to figure it out. |
轻微
[术语]
"Public Proxy"
理由: Inconsistent: '公益站' means 'free API proxy site' not just 'public proxy'; reference uses 'free API proxy site' | “公益站” better rendered as “free API proxy site”; “Public Proxy” omits the ‘free’ implication. | Should be 'free API proxy sites'. | The reference uses 'free API proxy site' which is more specific and accurate than 'Public Proxy'. |
| Qwen/Qwen3-14B | Istruzioni per l'uso del sito gratuito GGBOOM API | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: GGBOOM Free Shared API Site Usage Guide Content: Many people don't understand how to use the OpenAI calls on sub2api type free shared API sites. Honestly, I didn't understand it at first either; reading the source code with AI helped me get it. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: GGBOOM Free Shared API Site Usage Guide Content: Many users don't understand how to call OpenAI on sub2api type free shared API sites. Honestly, I didn't get it at first either; only after reading the source code did an AI help me understand. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Free Shared API Site Usage Guide |
严重
[准确性]
"Free Shared API Site Usage Guide"
理由: Incorrect: '公益站' means 'free API proxy site' not 'Free Shared API Site'; missing 'GGBOOM' specific reference | Should be 'Usage instructions for the GGBOOM free API proxy site'; loses the specific site name and context | Misses the specific brand name 'GGBOOM' which is in the source title, reducing specificity. |
| google/gemma-3-12b-it | Usage Instructions for the GGBOOM Public Proxy Site |
严重
[准确性]
"Usage Instructions for the GGBOOM Public Proxy Site"
理由: Only the title is translated; the explanatory content about sub2api-type sites and OpenAI calls is entirely omitted. | The content is missing. | The translation completely omits the 'Content' section. |
| google/gemma-3-1b-it | Translation of 'GGBOOM公益站的食用说明' in en |
轻微
[准确性]
"Translation of 'GGBOOM公益站的食用说明' in en"
理由: The hypothesis is a meta-instruction describing a translation task rather than the actual translated content. It fails to translate the source text. | Hypothesis only states what is being translated rather than providing the actual translation. This is incomplete. | No translation given; only a task description. Should convey "Usage instructions for the GGBOOM free API proxy site" including the playful "食用说明" nuance. | Hypothesis is a literal translation placeholder instead of actual translation of the source content |
| google/gemma-3-4b-it | Title: GGBOOM Free Shared API Site Usage Instructions Content: Many people don't understand the sub2api type free shared API site OpenAI call. To be honest, I didn't understand it at first either, and only figured it out by letting AI read the source code. |
轻微
[流畅性]
"sub2api type free shared API site OpenAI call"
理由: The term 'Free Shared' is redundant and less fluent compared to the reference's 'free API proxy site'. | 'how to make OpenAI calls' or 'OpenAI API usage' is more natural than 'OpenAI call'. | Phrase is grammatically awkward; should be “how to call OpenAI on sub2api-type free API proxy sites”. | Awkward phrasing; should be 'how to make OpenAI calls on sub2api-type free API proxy sites' per reference |
| google/translategemma-12b-it | Title: Usage Instructions for the GGBOOM Public Proxy Site Content: Many users don't understand how to use the OpenAI API with sub2api type public proxy sites. To be honest, I didn't understand it at first either, but after having AI read the source code, I finally understood. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: GGBOOM Free Shared API Site Usage Instructions | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: GGBOOM公益站的食用说明' in en |
严重
[准确性]
"Translation of 'Title: GGBOOM公益站的食用说明' in en"
理由: Hypothesis is not a translation but a description of the task. The actual translation is missing. | The hypothesis is a meta-description string, not the actual translation of the source title. | No English translation is present; should convey “Usage instructions for the GGBOOM free API proxy site” (with playful ‘usage/eating’ nuance if preserved). | Translation of 'GGBOOM公益站的食用说明' as 'Usage instructions for the GGBOOM free API proxy site' is more accurate |
| tencent/HY-MT1.5-7B | Title of GGBOOM Public Welfare Site: Instructions for Use | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: It's pollen season soon, any recommendations for dust mask? Electric car riding |
严重
[准确性]
"dust mask"
理由: The source asks for '护目镜' (goggles/eye protection). The hypothesis translates it as 'dust mask' (face/respiratory protection), which is a different item. | Misinterpretation of 'goggles' as 'dust mask'. | Inaccurate. Source asks for '护目镜' (goggles/eye protection), not dust mask. Reference correctly identifies 'goggles'. | Mistranslation: '电动车' means 'electric bike/scooter' not 'electric car'. Also '护目镜' means 'goggles/eye protection' not just 'dust mask'. | Source specifically asks for 护目镜 (goggles); “dust mask” changes the protection gear type. |
| CohereLabs/tiny-aya-water | Title: It's almost pollen season, any recommendations for goggles? Electric car riding | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Pollen season is coming up, recommend some goggles for electric bike riding Content: Beijing, I saw a post today saying that we're about to enter pollen season and need recommendations for dust-proof equipment. My current thought is to buy a full-face helmet, not sure if this will work. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Pollen season is coming soon, recommend some goggles, and electric bike riding Content: bj, today saw a tweet saying that the pollen season is about to start, looking for recommendations on dust protection gear. Now thinking of buying a full-face helmet, not sure if that will work. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Sta per arrivare alla stagione delle polveri, potresti consigliare degli occhiali protettivi per andare in bicicletta elettrica? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Pollen season is coming soon; eye protection recommended, e-bike riding Content: bj, today I saw a post saying pollen season is approaching and asked for recommendations on dust-proof gear. My current idea is to buy a full-face helmet, but I'm not sure if that will work. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: The pollen season is almost here, any recommendations for goggles? Riding an electric bike. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: It's almost pollen season, recommend some protective eyewear. Electric bike riding. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Pollen season is coming soon, recommend some goggles, and electric vehicle riding. |
严重
[准确性]
"recommend some goggles, and electric vehicle riding."
理由: Incomplete and unclear translation. Missing context about Beijing location and anti-dust gear recommendations. | Missing Content translation entirely; also 'recommend some goggles' is incomplete - should be 'looking for goggles recommendations' | The translation completely omits the 'Content' section. The title grammar is also poor ('and electric vehicle riding' hangs loosely). | Misinterprets structure: the title asks for goggle recommendations for riding an electric bike during pollen season; hypothesis treats 'electric vehicle riding' as a separate item and omits the request context. |
| google/gemma-3-1b-it | Immediately enter pollen season, recommend some dust protection gear. |
严重
[准确性]
"Immediately enter pollen season, recommend some dust protection gear."
理由: Hypothesis is in English instead of Chinese (target language is zh). Also omits the location 'bj' (Beijing) and the content about the full-face helmet consideration | The hypothesis is incomplete and lacks context. It misses 'for riding an electric bike' and the specific mention of 'Beijing' (bj). The source also mentions 'goggles' specifically, not just general dust protection gear. | Incomplete translation. The hypothesis only covers a fragment of the title and omits the content entirely. It also misses the location context 'bj' (Beijing) and the specific request for 'goggles' mentioned in the title. |
| google/gemma-3-4b-it | Title: Spring Allergy Season is Coming Soon, Eye Protection Recommended, Electric Vehicle Riding |
严重
[流畅性]
"Spring Allergy Season is Coming Soon, Eye Protection Recommended, Electric Vehicle Riding"
理由: Awkward title structure; reference version 'Pollen season is coming, looking for goggles recommendations for riding an electric bike' is more natural | The title is overly verbose and less fluent compared to the reference. | Title is unnatural and fragmented; misses ‘pollen season’ and clear request for goggle recommendations for e-bike riding. |
| google/translategemma-12b-it | Title: Pollen season is coming soon, recommend some goggles for riding electric vehicles. |
严重
[准确性]
"recommend some goggles for riding electric vehicles."
理由: The hypothesis only translates the title and omits the content which specifies the location (Beijing) and the user's specific idea (full-face helmet). | The hypothesis only translates the title but omits the content. The source includes location context and detailed question that should be translated. | Source asks for recommendations for anti-dust gear generally and mentions idea of buying a full-face helmet; hypothesis narrows to only goggles, omitting broader protective gear and helmet consideration. |
| google/translategemma-4b-it | Title: Spring allergy season is coming soon, recommend some eye protection gear, electric car riding | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Login |
严重
[准确性]
"Login"
理由: Translation of 'Login' is not required in the reference | Hypothesis is completely unrelated to the source. The source is about pollen season and goggles for cycling, not login. | The word “Login” is unrelated to the Chinese title and does not translate any part of it. | The hypothesis is completely unrelated to the source text (which is about pollen season and goggles) and appears to be a hallucinated label. |
| tencent/HY-MT1.5-7B | "en": "Title: Pollen season is approaching soon – recommendations for goggles when riding an electric bike.", "zh-TW": "标题:花粉季节即将来临——骑电动自行车时推荐佩戴护目镜!", "ja": "タイトル:花粉シーズンがもうすぐ始まります——電動自転車に乗るときはゴーグルのご使用をお勧めします!", "ko": "제목: 곧 꽃가루 시즌이 시작됩니다 — 전동자전거를 탈 때는 고글을 추천합니다!", "de":="Titel: Die Pollensaison naht bald – Empfehlungen für Schutzbrillen beim Fahren eines Elektrofahrrads.", "fr":="Titre : La saison des pollens arrive bientôt – Nous recommandons l’utilisation de lunettes de protection lors de la conduite d’un vélo électrique.", "it":="Titolo: La stagione dei pollini sta per iniziare – Si consiglia di utilizzare occhiali da protezione quando si pedala in bicicletta elettrica."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: Originally I wanted to subscribe myself to claude code but after searching I needed phone number, email address, overseas payment card... So I ended up going directly to the transit station... |
严重
[准确性]
"transit station"
理由: Literal translation of '中转站'. In this context, it refers to an 'API proxy site' or 'reseller', not a physical transit station. | Missing title: The source title '中转站都是自己维护的号池么' (Do API proxy sites all maintain their own account pools?) is completely missing. The hypothesis only has content. | “中转站” here means API proxy/relay site, not a physical transit station; mistranslation of technical term. |
| CohereLabs/tiny-aya-water | Title: Are all the accounts at the transit station maintained by themselves? |
严重
[准确性]
"transit station"
理由: Mistranslation of '中转站' - should be 'API proxy sites' not 'transit station'. Also 'maintained by themselves' is unclear; should be 'maintain their own account pools'. | Mistranslation: '中转站' means 'API proxy site' not 'transit station'. The context is about API proxy sites maintaining account pools, not a transit station. | “中转站” here refers to API proxy sites, not physical transit stations; wrong domain term. |
| Qwen/Qwen2.5-14B-Instruct | Title: Are all the relay stations their own number pools? Content: Originally wanted to subscribe to claude code myself but needed phone, email, overseas payment card... In the end, I just used the relay station... How do they manage so many accounts on the relay station... Wouldn't the number pool be consumed? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Is all the relay station just a pool of self-maintained accounts? |
轻微
[术语]
"relay station"
理由: Inconsistent: '中转站' means 'API proxy site' not 'relay station'; reference uses 'API proxy sites' | Should be 'API proxy sites' as per reference. | “中转站” in context is closer to “API proxy site / relay site”; “relay station” is generic and may be less clear.
严重
[准确性]
"Is all the relay station just a pool of self-maintained accounts?"
理由: The translation is a bit clunky. 'Do API proxy sites all maintain their own account pools?' is much clearer. 'Relay station' is a literal translation of '中转站' which works but 'proxy sites' is the functional equivalent. | The hypothesis is missing the title translation entirely. The reference provides: 'Title: Do API proxy sites all maintain their own account pools?' which is the correct translation of the title. | Missing title translation: source has title '中转站都是自己维护的号池么' which should be translated |
| Qwen/Qwen3-14B | Tutti gli account sulla stazione di trasbordo vengono gestiti autonomamente? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Are the proxy pools self-maintained? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Are all relay stations self-maintained account pools? Content: Originally wanted to subscribe to Claude Code myself, but searching revealed requirements for mobile phone, email, overseas payment card... In the end, I went straight to Relay Station... How do they manage so many accounts at Relay Station?... Will the account pool also be consumed? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Are all relay stations maintained by oneself? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Are all relay stations self-maintained account pools? |
严重
[准确性]
"Are all relay stations self-maintained account pools?"
理由: The title is missing. | The translation completely omits the 'Content' section. | 'relay stations' is literal translation of '中转站' but reference uses 'API proxy sites'; missing Content translation |
| google/gemma-3-1b-it | Are you referring to the 'mid-transfer station' or something else? |
轻微
[准确性]
"'mid-transfer station'"
理由: Mistranslation of "中转站", which in this context means API proxy sites, not "mid-transfer station" in a physical sense. | The hypothesis is a question asking for clarification rather than a translation of the source content. It does not translate the actual source material about API proxy sites and account pools. | Hypothesis is in English instead of Chinese (target language is zh). Also mistranslates '中转站' (API proxy site) as 'mid-transfer station' and omits the content about subscribing to Claude Code and account pool concerns | Complete mistranslation. The hypothesis turns the title into a question about the term 'mid-transfer station', whereas the source is asking if proxy sites maintain their own account pools. It fails to convey the actual meaning of the text. |
| google/gemma-3-4b-it | Are all the proxy stations self-maintained number pools? |
轻微
[术语]
"number pools"
理由: Should be 'Do API proxy sites all maintain their own account pools?' per reference for clarity | '号池' refers to 'account pools', not 'number pools' (which implies phone numbers only). | “号池” refers to account pools; “number pools” is literal and less accurate than “account pools” for this context. |
| google/translategemma-12b-it | Are all the accounts used by relay stations self-maintained? | 通过 (无共识错误) |
| google/translategemma-4b-it | Are all the accounts used at the transfer station self-maintained? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '中转站都是自己维护的号池么' in en |
严重
[准确性]
"Translation of '中转站都是自己维护的号池么' in en"
理由: Hypothesis is not a translation but a description of the task. The actual translation is missing. | The hypothesis lacks the English rendering; should ask whether proxy/relay sites maintain their own account pools. | The hypothesis is a meta-description string, not the actual translation of the source title. | Translation of '中转站都是自己维护的号池么' as 'Do API proxy sites all maintain their own account pools?' is more accurate |
| tencent/HY-MT1.5-7B | Are all the account pools managed by the transfer stations themselves? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Using GGBOOM's free API at openclaw | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Using GGBOOM's free API at openclaw | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Usa l'API del sito GGBOOM gratuito per OpenClaw |
致命
[准确性]
"Usa l'API del sito GGBOOM gratuito per OpenClaw"
理由: The translation is in Italian, but the target language is Chinese (source) to English (implied by other items and reference). This is a complete language mismatch. | Hypothesis is in Italian, not English. The target language is zh (Chinese), and the hypothesis should be an English translation. This is a complete failure to provide the correct language output. | Wrong language: Hypothesis is in Italian, not Chinese (target language) |
| Qwen/Qwen2.5-7B-Instruct | Title: Use GGBOOM Public Proxy API for OpenClaw Content: What I've figured out is that the key point is: ai.qaq.al does not support old openai-completions; it needs to be changed to openai-responses |
轻微
[术语]
"Public Proxy"
理由: Inconsistent: '公益站' means 'free API proxy site' not just 'public proxy'; reference uses 'free API proxy site' | Should be 'free API proxy site'. | The reference uses 'free API proxy site' which is more specific and accurate than 'Public Proxy'. |
| Qwen/Qwen3-14B | Usare l'API del sito proxy GGBOOM con openclaw | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Using GGBOOM free shared API site's api for openclaw Content: After testing, the key point is: ai.qaq.al does not support old openai-completions; needs to be changed to openai-responses | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Use GGBOOM free shared API site's api for openclaw Content: After my experimentation, the key point is: ai.qaq.al does not support old openai-completions and needs to be changed to openai-responses | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Use GGBOOM's public proxy API with OpenClaw Content: I got it working here. The key point is: ai.qaq.al does not support old OpenAI completions and needs to be changed to OpenAI responses. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Use the GGBOOM public proxy API for OpenClaw |
严重
[准确性]
"Use the GGBOOM public proxy API for OpenClaw"
理由: The content is missing. | The translation completely omits the 'Content' section. | Only the title is translated; the key technical detail about changing openai-completions to openai-responses is omitted. |
| google/gemma-3-1b-it | Translation of '将GGBOOM公益站的api 用于openclaw' in en |
轻微
[准确性]
"Translation of '将GGBOOM公益站的api 用于openclaw' in en"
理由: Only a meta description; does not actually translate the sentence, which should express using the GGBOOM free API proxy site API with OpenClaw. | The hypothesis is a meta-instruction describing a translation task rather than the actual translated content. It fails to translate the source text. | Hypothesis only states what is being translated rather than providing the actual translation. This is incomplete. | Hypothesis is a literal translation placeholder instead of actual translation of the source content |
| google/gemma-3-4b-it | Title: Using the GGBOOM public proxy API for openclaw Content: I have configured this locally, the key point is: ai.qaq.al does not support old openai-completions and needs to be changed to openai-responses | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Use the GGBOOM public API for openclaw Content: After some troubleshooting, the key point is that ai.qaq.al does not support the old openai-completions and needs to be changed to openai-responses. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Using the GGBOOM public API on openclaw Content: I have configured this, the key point is: ai.qaq.al does not support old openai-completions, needs to be changed to openai-responses |
轻微
[术语]
"public API"
理由: Should be 'free API proxy site' not 'public API'. Reference correctly uses 'free API proxy site'. | "公益站" again better as "free API proxy site"; "public API" could imply official API rather than community proxy. | Consistent error pattern: '公益站' should be 'free API proxy site' rather than 'public API' to reflect the unofficial/charity nature. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 将GGBOOM公益站的api 用于openclaw' in en |
严重
[准确性]
"Translation of 'Title: 将GGBOOM公益站的api 用于openclaw' in en"
理由: No English text is given; should indicate using the GGBOOM free API proxy site’s API with/in OpenClaw. | The hypothesis is a meta-description string, not the actual translation of the source title. | Translation of '将GGBOOM公益站的api 用于openclaw' as 'Using the API from GGBOOM free API proxy site for OpenClaw' is more accurate | Hypothesis is not a translation but a description of the task. The actual translation is missing. |
| tencent/HY-MT1.5-7B | Title: Using the GGBOOM Public Welfare Site’s API in OpenClaw | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Registering email for chatgpt using your xyz domain name, US node, you can get team for one month 😄. Solved got team car team |
严重
[准确性]
"破案了got team车队 / Solved got team car team"
理由: Missing context about 'GPT Team' and 'shared subscription groups' | Mistranslation of slang '车队' (shared subscription group) as 'car team', resulting in nonsense. | Mistranslation: '车队' means 'shared subscription groups' not 'car team' | Mistranslation of '破案了got team车队' - should be 'Case closed on the GPT Team shared subscription groups' not 'got team car team' | Mistranslation of slang; refers to solving how to join GPT Team shared subscription groups, not a literal 'car team' or 'got team'. |
| CohereLabs/tiny-aya-water | Title: Using your own server to run chatbot models, like running Docker containers on Kubernetes cluster, but with more control over the hardware and network. |
致命
[准确性]
"Using your own server to run chatbot models, like running Docker containers on Kubernetes cluster, but with more control over the hardware and network."
理由: Completely unrelated content; does not convey any information about registering ChatGPT with .xyz email, US node, or Team subscription. | Complete mistranslation - hypothesis has no relation to source content about registering ChatGPT with .xyz domain for Team subscription | Hypothesis completely misrepresents source content about ChatGPT registration with .xyz domain email and Team subscription, instead describing server/Docker infrastructure unrelated to source | The hypothesis does not match the source or reference content. | The hypothesis is a complete hallucination unrelated to the source text. The source discusses registering for ChatGPT Team using .xyz domains and US nodes, while the hypothesis talks about running Docker on Kubernetes. |
| Qwen/Qwen2.5-14B-Instruct | Title: Register for ChatGPT using an email from a xyz domain application, US node, can claim team one month 😄. Mystery solved got team fleet. Content: Fiddled around and found a pattern, but registration, card binding, domain registration still have costs, a few dollars worth, directly easier at the seafood market. |
轻微
[准确性]
"team fleet"
理由: Misinterprets “车队” which in this context refers to shared subscription groups, not an actual fleet of teams. | The translation is vague and does not specify 'GPT Team' or 'shared subscription groups' as in the reference. | Misinterprets '车队' (carpool/shared group) as 'fleet' (vehicles), losing the context of shared subscription groups.
严重
[准确性]
"seafood market"
理由: Literal translation of '海鲜市场' fails to convey the cultural reference to 'Xianyu' (a second-hand trading platform), which is clear in the reference. | Translates “海鲜市场” literally, missing the specific platform name Xianyu that the source and reference imply. | Incomplete translation. Should be 'Xianyu (the seafood market)' - a specific platform name, not just 'seafood market'. Also missing context that this refers to buying accounts. |
| Qwen/Qwen2.5-7B-Instruct | Title: Register ChatGPT with xyz domain email, US node, can get team for one month 😄. Got team 车队. Content: Figured out the pattern myself, but there's still costs involved like registering, binding cards, and domains, just a few bucks, direct seafood market is more convenient |
严重
[准确性]
"got team车队"
理由: Mistranslates “破案了got team车队” which refers to cracking the case about GPT Team shared subscription groups; current text is unclear and mixes languages. | Incorrect translation of '破案了got team车队' which should refer to solving a case related to team subscriptions, not just 'Got team 车队'. | Mistranslation: 'got team车队' is incorrectly translated. The reference shows 'Case closed on the GPT Team shared subscription groups', meaning the user solved the mystery about GPT Team group subscriptions. The hypothesis incorrectly keeps '车队' (fleet) which is wrong. | Mixed language code-switching not properly handled; should be 'Got team shared subscription groups' or similar, not literal translation with Chinese characters |
| Qwen/Qwen3-14B | Registrati su chatgpt con una mail del dominio xyz, nodo USA, puoi ottenere un team per un mese 😄. Risolto, got team flotta |
严重
[准确性]
"Risolto, got team flotta"
理由: Missing context about GPT Team and shared subscription groups. | The hypothesis only translates the Title and omits the entire Content section. Additionally, 'flotta' is an incorrect translation for '车队' in this context (which refers to a shared account group/car-pooling, not a naval fleet), and 'got team' is left untranslated. | Incorrect translation of '破案了got team车队'. Should be 'Case closed on GPT Team shared subscription' not 'got team flotta'. 'Flotta' (fleet) is wrong; '车队' refers to a group/team, not a fleet. | "破案了got team车队" refers to solving the mystery about GPT Team shared subscription groups; "flotta" (fleet) and omission of the shared-subscription nuance make this inaccurate. | Translation is in Italian instead of Chinese (target language zh) |
| Qwen/Qwen3.5-4B | Title: Email registered via xyz domain to chatgpt, US node, can claim team for one month 😄. Solved! Got team fleet. |
严重
[准确性]
"破案了got team车队"
理由: Misinterprets '车队' (carpool/shared subscription group) as 'fleet' in a literal sense, losing the slang meaning of a shared account group. | Mistranslation of '破案了got team车队' - should be 'Case closed on the GPT Team shared subscription groups' not 'Got team fleet' | The hypothesis omits the idiomatic meaning of “破案了” (case closed / mystery solved) and “车队” (shared subscription group), rendering it as “Got team fleet” which is misleading and unnatural in this context. | Incomplete translation - '车队' (fleet/group) is missing in hypothesis |
| Qwen/Qwen3.5-9B | Title: Register chatgpt using an email registered with xyz domain, US node, can claim a free team for one month 😄. Solved: got team fleet Content: I figured out some patterns myself, but registration, card binding, and domain registration still cost money—a few yuan. It's more convenient to do this directly on Xianyu. |
严重
[准确性]
"got team车队"
理由: Mistranslation of '车队' (carpool/shared group) as 'fleet' (vehicles), losing the context of shared account groups. | Hypothesis translates as 'got team fleet' but reference shows 'GPT Team shared subscription groups' - misses the meaning of '车队' as a shared subscription group | Mistranslation: '车队' refers to shared subscription groups, not 'fleet' | “车队”表示拼车/车队式团购或共享组,译为“fleet”较生硬,应体现共享订阅组含义。 | Missing context about GPT Team and shared subscription groups. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Using xyz domain for email registration to apply for ChatGPT, from an American server, you can get a team for one month 😊. I've cracked it, got the team squad. |
严重
[准确性]
"got team车队"
理由: Misinterprets '车队' (car pool/shared group) as 'squad'. In this context, it refers to a shared subscription group, not a team of people. | Mistranslation: 'got team车队' refers to joining GPT Team shared subscription groups, not a 'team squad' | Mistranslation of '破案了got team车队'. Should be 'Case closed on the GPT Team shared subscription groups' not 'I've cracked it, got the team squad' |
| google/gemma-3-12b-it | Registered chatgpt with an xyz domain email, US node, you can get a team membership for one month 😄. Solved the case, got team fleet. |
严重
[准确性]
"got team fleet"
理由: Mistranslation of slang '车队' (carpool/shared group) as 'fleet' (vehicles), losing the context of shared subscription groups. | Mistranslation: '车队' (fleet/group purchase) should be 'team subscription group' or similar, not 'fleet' | Mistranslation of '破案了got team车队'. Should be 'Case closed on the GPT Team shared subscription groups' not 'got team fleet'. The term '车队' (fleet/group) refers to shared subscription groups, not literal fleet. | "车队" in this context refers to shared Team subscription groups, not a literal "fleet"; meaning is significantly distorted | Missing context about GPT Team and shared subscription groups. |
| google/gemma-3-1b-it | Translation of '用xyz 域名申请的邮箱注册chatgpt' with Linux |
严重
[准确性]
"Translation of '用xyz 域名申请的邮箱注册chatgpt' with Linux"
理由: Only a fragment is translated, missing most of the source meaning (US node, Team month, costs, Xianyu). Adds unrelated 'with Linux' not in source. | The hypothesis is a meta-commentary describing a translation task rather than providing the actual translation of the source text. It fails to convey any meaning from the source. | Hypothesis is not a proper translation - it's a placeholder text that doesn't translate the source content at all | Hypothesis is incomplete and incoherent. It appears to be a fragment rather than a full translation. Missing content and unclear connection to 'Linux'. | The translation is incomplete and does not cover the entire source text. |
| google/gemma-3-4b-it | Title: Email registration for xyz domain, ChatGPT, US node, can get a team for one month 😄. Case closed, got team fleet. |
严重
[准确性]
"got team fleet"
理由: Mistranslation: '车队' (fleet) should be '车队/群' referring to shared subscription groups, as per reference 'GPT Team shared subscription groups' | Mistranslation of '破案了got team车队'. Should be 'Case closed on the GPT Team shared subscription groups' not 'got team fleet'. The term '车队' refers to shared subscription groups, not fleet. | Literal translation of '车队' (fleet) fails to capture the slang meaning of 'shared subscription groups' or 'carpooling' in this context. | ‘车队’ here refers to shared subscription groups, not ‘fleet’ literally; loses idiomatic meaning. | The translation is slightly vague compared to the reference. |
| google/translategemma-12b-it | Register for ChatGPT using an xyz domain email, US node, and you can get a Team account for one month! Solved the case, got Team fleet. |
严重
[准确性]
"got Team fleet"
理由: Mistranslation of slang '破案了' (case solved/figured it out) and '车队' (shared group/car pool). The hypothesis translates 'fleet' literally, losing the context of a shared subscription group. | Mistranslation of '破案了got team车队'. Should be 'Case closed on the GPT Team shared subscription groups' not 'got Team fleet'. The term '车队' refers to shared subscription groups, not fleet. | Misinterprets “车队” which refers to shared subscription groups, not a literal fleet. | Mistranslation: '车队' refers to shared subscription groups, not 'fleet' | The translation omits the context about GPT Team and shared subscription groups. |
| google/translategemma-4b-it | Title: Email registration for chatgpt using xyz domain, US node, can get a team for one month 😄. Case closed, got team car. |
严重
[准确性]
"got team car"
理由: Mistranslation of “车队”, which refers to shared subscription groups, not a literal car. | Mistranslation of '破案了got team车队' - should be 'Case closed on the GPT Team shared subscription groups' not 'got team car'. '车队' means group/fleet, not 'car'. | Mistranslation of slang '车队' (team/shared group) as literal 'car', losing the intended meaning of a shared subscription group. | Mistranslation: '车队' (car fleet/group) was incorrectly translated as 'car', missing the meaning of 'shared subscription groups' |
| tencent/HY-MT1.5-1.8B | Translation of '用xyz 域名申请的邮箱注册chatgpt ,美国节点,可以领取team 一个月😄' in English |
严重
[准确性]
"Translation of '用xyz 域名申请的邮箱注册chatgpt ,美国节点,可以领取team 一个月😄' in English"
理由: The hypothesis is a meta-description of the task rather than the actual translation of the source text. No content was translated. | Placeholder meta-text; does not provide an actual translation and omits key details like .xyz domain, US node, and one-month Team reward. | Hypothesis is a placeholder string rather than an actual translation | The translation is too simplistic and does not capture the full meaning of the source text. |
| tencent/HY-MT1.5-7B | Use an email account registered with an xyz domain name to chat with ChatGPT on the US node and get a free one-month trial of Team. Got the Team fleet! |
严重
[准确性]
"Use an email account registered with an xyz domain name to chat with ChatGPT on the US node and get a free one-month trial of Team. Got the Team fleet!"
理由: Mistranslation: 'Got the Team fleet' is incorrect. The original 'got team车队' means 'got the Team shared subscription groups' (referring to joining a group purchase for Team subscription), not a literal fleet. | Mistranslation of '注册 chatgpt' (register for ChatGPT) as 'chat with ChatGPT', changing the meaning of the action. | The hypothesis is only a partial translation of the title and completely omits the content. The reference provides a complete translation of both title and content. | Misses cost/effort aspect (registering, binding card, domain) and the comparison that buying from Xianyu is more convenient; also "fleet" mistranslates "车队" which here refers to shared subscription groups. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | In VibeCoding, I was wondering why I would have to type when I could use voice input instead? |
轻微
[流畅性]
"In VibeCoding, I was wondering why I would have to type when I could use voice input instead?"
理由: Missing title translation entirely; hypothesis only contains content translation, not the title | Unnecessary addition of 'by hand' | Missing Title translation from source |
| CohereLabs/tiny-aya-water | In VibeCoding, I wondered why I had to type with my hands when I could use voice input instead? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Open-sourcing a small utility that binds the two side buttons of the mouse to voice input and enter for personal use in Vibecoding Content: While Vibecoding, I wondered why I still need to type with my hands? Why can't I just use voice input? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Open-sourcing a Vibecoding utility that binds the mouse scroll buttons to voice input and Enter Content: When using Vibecoding, I wondered why I had to type manually. Why couldn't I just use voice input? |
严重
[准确性]
"mouse scroll buttons"
理由: Incorrect translation of '鼠标的两个侧键' which refers to side buttons, not scroll buttons. | Mistranslation of '侧键' (side buttons) as 'scroll buttons'. Side buttons and scroll wheel buttons are different hardware components. | Source explicitly says “两个侧键” (two side buttons), not scroll buttons. | Should be 'side mouse buttons' not 'scroll buttons'; these are different mouse features |
| Qwen/Qwen3-14B | Open source a Vibecoding utility that binds the two mouse side buttons to voice input and enter | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Open source a Vibecoding personal utility that maps the two side buttons on the mouse to voice input and Enter respectively Content: While vibecoding, I wondered why I still have to type manually? Why can't I use voice input instead? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: An open-source Vibecoding utility I use personally that binds two mouse side buttons to voice input and Enter Content: While using Vibecoding, I wondered why I still have to type with my hands? Why can't I just use voice input? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Open source a small self-use tool that binds both mouse side keys to voice input and enter for Vibecoding |
轻微
[准确性]
"binds both mouse side keys to voice input and enter"
理由: Missing 'respectively' - should be 'voice input and enter respectively' | Missing 'respectively'. The source specifies binding one key to voice and the other to enter, not both keys to both functions. | Source says the two side keys are bound respectively to voice input and Enter; hypothesis implies both keys are bound to both functions | The translation misses the word 'respectively', which is present in the reference. |
| google/gemma-3-12b-it | Title: An open-source Vibecoding utility for assigning mouse side buttons to voice input and enter. |
轻微
[准确性]
"An open-source Vibecoding utility for assigning mouse side buttons to voice input and enter."
理由: Omission of the source content which contains the rationale ('why do I still have to type...'). The hypothesis only translates the title partially. | Missing 'respectively' which clarifies that the two side buttons are bound to different functions (voice input and enter separately). Reference includes this detail. | Missing 'respectively' which clarifies the assignment. |
| google/gemma-3-1b-it | In VibeCoding, I wonder why I need to type my hands when I can use voice input? |
轻微
[流畅性]
"type my hands"
理由: Unnatural and incorrect phrase; should be 'type with my hands' or similar. | The phrasing is slightly awkward and could be more natural. | Awkward phrasing 'type my hands' should be 'type by hand' or 'use my hands to type'. Minor grammatical issue. |
| google/gemma-3-4b-it | Title: Open Source Tool to Bind Mouse Side Buttons to Voice Input and Enter for Vibecoding | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: An open-source Vibecoding utility for personal use that binds the two side buttons of the mouse to voice input and enter. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Open-source tool for Vibecoding that binds mouse side buttons to voice input and Enter. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '开源一个把鼠标的两个侧键分别绑定为语音输入和回车的Vibecoding自用小工具' in en", "zh-TW": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in zh-TW", "ja": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in ja", "ko": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in ko", "de": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in de", "fr": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in fr", "it": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in it} |
严重
[准确性]
""en": "Translation of '开源一个把鼠标的两个侧键分别绑定为语音输入和回车的Vibecoding自用小工具' in en", "zh-TW": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in zh-TW", "ja": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in ja", "ko": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in ko", "de": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in de", "fr": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in fr", "it": "Translation of '開源一個把滑鼠的兩個側鍵分別綁定為語音輸入和回車的Vibecoding自用小工具' in it}"
理由: Hypothesis is a placeholder string rather than an actual translation | No actual translation is provided in any language; the text is purely meta-description and fails to convey the meaning of the source title. | The translation is overly verbose and does not match the concise nature of the reference. |
| tencent/HY-MT1.5-7B | Title: Open-source a small tool called Vibecoding that assigns the two side buttons of the mouse to voice input and Enter respectively. |
严重
[准确性]
"assigns the two side buttons of the mouse to voice input and Enter respectively."
理由: Source says binding to voice input and Enter while VibeCoding, implying usage context; minor omission of that context. | Content section was completely omitted from translation. | Incomplete translation; the hypothesis only contains the title and omits the entire 'Content' section of the source text. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is 龙虾🦞 the 26 years of Metaverse? |
严重
[准确性]
"26 years"
理由: Should be "'26" or "2026" (the year), not "26 years". | Should be '26 (year)' not '26 years' - missing the apostrophe for year abbreviation | Incorrect translation of '龙虾' - should be 'OpenClaw' not kept as Chinese; also '26 years' is wrong, should be "'26" (year 2026) | Failed to translate the slang term '龙虾' (OpenClaw) and retained Chinese characters. Also, '26 years of Metaverse' is a mistranslation of 'Metaverse of '26'. | Incorrect translation of 'OpenClaw' and addition of 'years' |
| CohereLabs/tiny-aya-water | Will Crab Shack be the Metaverse of 2026? |
轻微
[准确性]
"Will Crab Shack be the Metaverse of 2026?"
理由: Misses the metaphor that everyone is suddenly talking about OpenClaw like the metaverse in 2021–22; also invents "Crab Shack" which is not in source. | Mistranslation of '龙虾' (Lobster). In this context, '龙虾' refers to the 'OpenClaw' project (a play on Codex/Claw), not a restaurant chain like 'Crab Shack'. The reference correctly identifies it as 'OpenClaw'. | The hypothesis incorrectly translates '龙虾' as 'Crab Shack' instead of 'OpenClaw'. | Mistranslates '龙虾' (OpenClaw) as 'Crab Shack' and misses the emoji reference; also changes meaning from question about trend to specific product name |
| Qwen/Qwen2.5-14B-Instruct | Title: Could Lobster 🦞 Be the Metaverse of 2026? Content: It feels like everyone is suddenly talking about lobster, just as they did about the metaverse in 2021 and 2022. |
严重
[准确性]
"everyone is suddenly talking about lobster"
理由: Source refers to “龙虾” generically but the established term in context is the product/name “OpenClaw”, losing this specific reference. | The translation incorrectly uses 'Lobster' instead of 'OpenClaw' as in the reference. | Fails to recognize '龙虾' as the community slang nickname for the specific tool 'OpenClaw', translating it literally instead. |
| Qwen/Qwen2.5-7B-Instruct | Title: Is the Lobster 🦞 the Metaverse of 2026? Content: It feels like everyone is talking about lobsters overnight, just like the metaverse in 2021-2022. |
严重
[准确性]
"the Lobster"
理由: Failed to recognize '龙虾' as the community slang/nickname for 'OpenClaw', translating it literally instead. | Source “龙虾” in this context and reference show it refers to OpenClaw; using generic “Lobster” misses the product name nuance. | Incorrect translation of '龙虾' which should be 'OpenClaw' in this context. |
| Qwen/Qwen3-14B | Il granchio 🦞 sarà il metaverso del 2026? |
严重
[准确性]
"Il granchio 🦞"
理由: Incorrect translation of '龙虾'. Translated as 'granchio' (crab) but should be 'OpenClaw' (a specific product name in context). '龙虾' means lobster, but in this context refers to the product 'OpenClaw'. | "龙虾" refers to lobster/OpenClaw, not crab; animal and implied product name are mistranslated. | The hypothesis translates '龙虾' literally as 'crab' (granchio), whereas the context and reference indicate it is a slang term for 'OpenClaw'. It also omits the entire Content section. | Translation is in Italian instead of Chinese (target language zh) |
| Qwen/Qwen3.5-4B | Title: Will Lobster 🦞 be the Metaverse of 2026? Content: It feels like everyone is suddenly talking about lobster, just as they were about the metaverse in 2021-2022. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Could Lobster be the Metaverse of 2026? Content: It feels like everyone is suddenly talking about lobster, just as they did with the metaverse in 2021 and 2022. |
严重
[准确性]
"Lobster"
理由: Mistranslation: '龙虾' is a nickname for 'OpenClaw', not literal lobster | Failed to recognize '龙虾' as the specific slang nickname for 'OpenClaw' in this community context, translating it literally as the animal. | 原文及参考均为“OpenClaw”,不是泛指“龙虾/LOBSTER”,将项目名误译为普通名词,丢失关键信息。 | Hypothesis translates '龙虾' as 'Lobster' but reference shows it should be 'OpenClaw' - this is a product/project name, not a literal translation |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Is '龙虾' going to be the next metaverse after 26 years? |
致命
[准确性]
"after 26 years"
理由: Misreads “26年的元宇宙”; it refers to year 2026, not a period of 26 years | Misinterpreted '26 年' (Year '26 / 2026) as a duration of 26 years. | Severe mistranslation: '26年' refers to year 2026, not 26 years. '龙虾' is slang for OpenClaw (a tool), not literal shrimp | Completely incorrect translation. '龙虾' is a slang term for 'OpenClaw', not a literal lobster. Should be 'Will OpenClaw 🦞 be the Metaverse of '26?' |
| google/gemma-3-12b-it | Title: Will Lobster 🦞 be the Metaverse of 2026? |
严重
[准确性]
"Lobster"
理由: Source uses "龙虾" as slang for the OpenClaw project; reference clarifies "OpenClaw". Using generic "Lobster" loses the specific product reference | '龙虾' in this context is a slang term referring to 'OpenClaw', not a literal lobster. Should be translated as 'OpenClaw' to maintain meaning. | Failure to translate the slang term '龙虾' which refers to the specific tool 'OpenClaw' in this context, translating it literally instead. |
| google/gemma-3-1b-it | Is the lobster catching a 26-year metaverse? |
严重
[准确性]
"Is the lobster catching a 26-year metaverse?"
理由: Misinterprets meaning. Source asks if 'lobster/OpenClaw' will be the Metaverse of 2026; 'catching a 26-year metaverse' is incorrect and unclear. | Incorrect translation - 'lobster' should be 'OpenClaw' (the AI tool name), 'catching' should be 'be', missing the apostrophe in '26 | Completely inaccurate translation. 'Lobster' should be 'OpenClaw' (a proper noun/product name). 'Catching' is wrong; should be 'be'. '26-year' is incorrect; should be 'of '26' or '2026'. | Mistranslation of '龙虾' in this context. Based on the reference and community slang, '龙虾' refers to a specific project (OpenClaw), not the act of catching lobsters. |
| google/gemma-3-4b-it | Title: Lobster 🦞 Will It Be the Metaverse of 2026? |
严重
[准确性]
"Lobster 🦞"
理由: Failure to recognize '龙虾' (Lobster) as a specific community slang/nickname for 'OpenClaw'. Translating it literally loses the specific referent intended in the context. | Should be 'OpenClaw' not 'Lobster'. The source uses '龙虾' which is a slang term for OpenClaw in this context, not a literal lobster. | Key term ‘龙虾’ here refers to the product ‘OpenClaw’, not literal lobsters; should use product name. |
| google/translategemma-12b-it | Title: Will Lobster be the Metaverse of 2026? |
严重
[术语]
"Lobster"
理由: Failure to recognize '龙虾' (Lobster) as the community slang for 'OpenClaw'. It should be translated as 'OpenClaw' to match the context. | Should be 'OpenClaw' (the actual project name), not translated as 'Lobster' | The translation incorrectly translates 'OpenClaw' as 'Lobster', which changes the subject. |
| google/translategemma-4b-it | Will Lobster be the Metaverse of 2026? |
严重
[准确性]
"Lobster"
理由: Missing the content translation entirely and misinterpreting the subject due to slang failure. | The title in reference is about “OpenClaw”, not generic “Lobster”; using Lobster loses the specific product name. | Should be 'OpenClaw' not 'Lobster'. '龙虾' is a slang term for OpenClaw in this context, not a literal lobster.
严重
[术语]
"Lobster"
理由: Should be 'OpenClaw' instead of 'Lobster' | Failed to translate the slang term '龙虾' to its specific community referent 'OpenClaw', interpreting it literally as the animal. | Incorrect translation: '龙虾' refers to 'OpenClaw' (a tool/AI system), not literal lobster |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾' in English |
严重
[准确性]
"Translation of '龙虾' in English"
理由: The hypothesis is a meta-instruction string, not a translation. It fails to translate the title and content, and misses the context that '龙虾' (Lobster) refers to 'OpenClaw' as shown in the reference. | Only says it is a translation of '龙虾' without giving the actual translated term (here it should reflect the product name OpenClaw rather than literal lobster). | Hypothesis is a placeholder string rather than an actual translation. The source '龙虾' is slang for 'OpenClaw' but hypothesis only translates the literal meaning |
| tencent/HY-MT1.5-7B | "en": "Title: Could Lobster 🦞 Be the Metaverse of 2026?", "zh-TW": "Title: 龙虾🦞會是26年的元宇宙嗎?", "ja": "Title: 龙虾🦞が2026年のメタバースになるのか?", "ko": "Title: 龙虾🦞이 2026년의 메타버스가 될 수 있을까?", "de":="Titel: Könnte Lobster 🦞 das Metaverse des Jahres 2026 sein?", "fr": "Titre: Le homard 🦞 pourrait-il être le métaverse de 2026 ?", "it": "Titolo: Il gambero 🦞 potrebbe essere il metaverso del 2026?" }|im_end|> |
致命
[准确性]
"Could Lobster 🦞 Be the Metaverse of 2026?"
理由: The hypothesis provides multiple language translations when only English translation was requested. More critically, it translates '龙虾' as 'Lobster' when the reference shows it should be 'OpenClaw' - a specific product/service name in this context. | Wrong entity: 'Lobster' should be 'OpenClaw' - this is a specific project name, not a literal lobster. | Failed to identify '龙虾' as the slang term for 'OpenClaw', translating it literally instead. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Raising Dragon Shrimp HaHa (It took so many tokens for just one hour of shrimp raising :sob: ) |
严重
[术语]
"Raising Dragon Shrimp HaHa"
理由: Incorrect translation of '龙虾' as 'Dragon Shrimp' instead of 'OpenClaw' | Mistranslation of the slang '养龙虾' (Running OpenClaw) as literally raising shrimp. | Misinterprets "养龙虾" which here refers to running OpenClaw, not literally raising shrimp or dragon shrimp. | Should be 'OpenClaw' not literal translation 'Dragon Shrimp' - this is a project name
严重
[准确性]
"It took so many tokens for just one hour of shrimp raising"
理由: Should be 'OpenClaw' not 'shrimp raising' | Extends the wrong metaphor; should state OpenClaw used many tokens in one hour, not shrimp raising. | Context refers to running an AI model, not farming seafood. |
| CohereLabs/tiny-aya-water | Title: Enjoying Crab Lovers (It took so many tokens for one hour of crab :sob: ) |
轻微
[准确性]
"Enjoying Crab Lovers"
理由: Mistranslation of '养龙虾'. '养' here means 'running' or 'hosting' the OpenClaw model, and '龙虾' is the model nickname. 'Crab Lovers' is a complete misinterpretation of the slang. | Changes meaning from "running OpenClaw" to "enjoying crab lovers"; mistranslates product/name and activity. | The hypothesis incorrectly translates '龙虾' as 'Crab Lovers' instead of 'OpenClaw'. | Mistranslates '养龙虾' (running OpenClaw) as 'Enjoying Crab Lovers'; also changes '龙虾' to 'crab' instead of proper noun OpenClaw |
| Qwen/Qwen2.5-14B-Instruct | Title: Raising Lobsters Haha (The lobsters used up so many tokens in just one hour :sob:) Content: Raising Lobsters Haha (The lobsters used up so many tokens in just one hour :sob:) |
严重
[准确性]
"Raising Lobsters"
理由: Should be 'Running OpenClaw' not 'Raising Lobsters'. '龙虾' is slang for OpenClaw in this context. | Translates the slang term '养龙虾' (running/hosting OpenClaw) literally as raising animals, missing the technical context entirely. | Source refers to running/using the OpenClaw system; literal “raising lobsters” misses the technical/product meaning present in the reference. | The translation incorrectly uses 'Lobsters' instead of 'OpenClaw' as in the reference. |
| Qwen/Qwen2.5-7B-Instruct | Title: Shrimp Farming Haha (The shrimp used way too many tokens in an hour :sob:) Content: Shrimp Farming Haha (The shrimp used way too many tokens in an hour :sob:) |
轻微
[准确性]
"Shrimp"
理由: Should be 'OpenClaw' - incorrect literal translation of product name | Literal translation of '养龙虾' (raising lobsters/OpenClaw). Misses the slang context referring to running the OpenClaw tool, and incorrectly changes lobster to shrimp. | Incorrect translation of '龙虾' which should be 'OpenClaw' in this context. | Source uses “龙虾” (lobster/OpenClaw), not shrimp; this changes the referent and conflicts with community jargon. |
| Qwen/Qwen3-14B | Title: Raising Crabs Haha (Crabs used up a lot of tokens in an hour :sob: ) Content: Raising Crabs Haha (Crabs used up a lot of tokens in an hour :sob: ) |
严重
[准确性]
"Raising Crabs Haha (Crabs used up a lot of tokens in an hour :sob: )"
理由: Incorrect translation of '龙虾'. Should be 'OpenClaw' not 'Crabs'. Also 'Raising' should be 'Running' for the context of software. | The hypothesis translates '龙虾' literally as 'Crabs' instead of the context-specific 'OpenClaw'. It also translates '养' as 'Raising' (breeding) instead of 'Running' (operating software). | Translation is in English instead of Chinese (target language zh) | "龙虾" refers to OpenClaw/lobster, not crabs; repeated substitution changes the subject. |
| Qwen/Qwen3.5-4B | Title: Raising Lobsters Haha (Lobster used so many tokens in one hour :sob:) Content: Raising Lobsters Haha (Lobster used so many tokens in one hour :sob:) | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Raising Lobsters Haha (A lobster used tons of tokens in an hour :sob:) Content: Raising Lobsters Haha (A lobster used tons of tokens in an hour :sob:) |
严重
[准确性]
"Raising Lobsters"
理由: Hypothesis translates '龙虾' as 'Lobsters' but reference shows it should be 'OpenClaw' - this is a product name, not a literal translation | 这里“龙虾”是 OpenClaw 项目昵称,参考中处理为 OpenClaw,直译“养龙虾”丢失工具名称含义。 | Failed to recognize '养龙虾' as the slang term for running/hosting the 'OpenClaw' service, translating it literally as farming animals. | Mistranslation: '养龙虾' is slang for 'running OpenClaw', not literal lobster raising |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Raising and Laughing at Shrimp (In an hour, shrimp tokens used a lot :sob: ) |
致命
[准确性]
"Raising and Laughing at Shrimp"
理由: Incorrect translation. '龙虾' is slang for 'OpenClaw', not literal shrimp. Should be 'Running OpenClaw haha' | Completely misinterprets the title. '养龙虾' means 'Running/Raising OpenClaw'. 'Laughing at' is an incorrect parsing of '哈哈' (haha). 'Shrimp' is the wrong animal (Lobster vs Shrimp) and misses the 'OpenClaw' reference. | Misinterprets “养龙虾哈哈”; context is about running OpenClaw, not raising shrimp
致命
[准确性]
"shrimp tokens"
理由: Nonsensical phrasing and wrong terminology ('shrimp' instead of 'OpenClaw'). | Should be 'OpenClaw tokens' not 'shrimp tokens' | “龙虾” refers to OpenClaw, not shrimp; token usage is about the model, not shrimp |
| google/gemma-3-12b-it | Title: Raising Lobsters Haha (An hour of lobsters used a lot of tokens :sob: ) Content: Raising Lobsters Haha (An hour of lobsters used a lot of tokens :sob: ) |
严重
[准确性]
"Raising Lobsters Haha (An hour of lobsters used a lot of tokens :sob: )"
理由: '龙虾' is slang for 'OpenClaw', not literal lobsters. Should be 'Running OpenClaw haha'. Also 'Raising' should be 'Running'. | Mistranslation of slang '养龙虾' (running OpenClaw) as literally raising lobsters. In this context, it refers to operating the AI tool. | Source/reference use "OpenClaw"; "Raising Lobsters" misinterprets the playful metaphor and loses the software name |
| google/gemma-3-1b-it | Translation of '养龙虾哈哈(一个小时龙虾就用了超多token :sob: )' in en |
严重
[准确性]
"Translation of '养龙虾哈哈(一个小时龙虾就用了超多token :sob: )' in en"
理由: Hypothesis is not a translation but a meta-description. It should provide the actual English translation, not describe what should be translated. | No actual translation provided; the hypothesis only describes the task, not the content. | Hypothesis is not a proper translation - it's a placeholder text that doesn't translate the source content | The hypothesis is a meta-commentary describing a translation task rather than providing the actual translation. It completely fails to translate the source content. | The translation is incomplete and does not cover the entire source text. |
| google/gemma-3-4b-it | Title: Raising Lobsters Haha (Raising lobsters for an hour used a lot of tokens :sob:) |
严重
[准确性]
"Raising Lobsters"
理由: Literal translation of '养龙虾'. In this technical context, '养' (raising) combined with the slang '龙虾' (OpenClaw) refers to 'Running' or 'Hosting' the service, not biological farming. | Should be 'Running OpenClaw haha' not 'Raising Lobsters Haha'. '龙虾' is slang for OpenClaw, and '养' means 'running' in this context, not 'raising'. | Mistranslation: '养龙虾' here means 'running OpenClaw' (using the tool), not literal raising of lobsters | ‘养龙虾’ in context refers to running OpenClaw, not literally raising lobsters; product name lost. |
| google/translategemma-12b-it | Title: Raising Lobsters Haha (It uses an extremely large number of tokens for just one hour of lobster raising :sob:) Content: Raising Lobsters Haha (It uses an extremely large number of tokens for just one hour of lobster raising :sob:) | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Raising Lobsters Haha (Raising lobsters for an hour used a lot of tokens :sob:) | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '养龙虾哈哈(一个小时龙虾就用了超多token :sob: )' in en |
严重
[准确性]
"Translation of '养龙虾哈哈(一个小时龙虾就用了超多token :sob: )' in en"
理由: Contains no concrete translation; does not map the slang '养龙虾' to the intended meaning of running OpenClaw or mention tokens usage. | Hypothesis is a placeholder string rather than an actual translation | The hypothesis is a meta-description string, not the actual translation. It fails to convey the meaning of '养龙虾' (Running OpenClaw) or any other part of the source. |
| tencent/HY-MT1.5-7B | "en": "Title: Raising Lobsters Haha (In just one hour, I used up so many tokens for lobsters :sob: ), "zh-TW": "Title: 養養龍蝦哈哈(一個小時內,我用掉了太多龍蝦代幣 :sob: ),", "ja": "Title: ロブスターを飼育するハハ(笑、たった一時間でロブスター用のトークンがこんなに消えた :sob: )", "ko": "Title: 로브스터를 키우는 하하(笑, 단 한 시간 만에 로브스터용 토큰이 이렇게 많이 소모되었다 :sob: )", "de":="Titel: Lobsterzucht – Ha ha, schon nach einer Stunde sind alle Tokens für Lobster aufgebraucht :sob:)", "fr": "Titre: Élevage de homards – Ha ha, en seulement une heure, tous les tokens pour homards ont été utilisés :sob:)", "it": "Titolo: Allevamento di aragoste – Ahah, in soltanto un’ora sono stati consumati tutti i token per le aragoste :sob:)"}|im_end|> |
轻微
[准确性]
"Raising Lobsters Haha (In just one hour, I used up so many tokens for lobsters :sob: )"
理由: Failed to identify '养龙虾' as slang for running the 'OpenClaw' tool, translating it literally as raising animals. | Wrong entity: 'Lobsters' should be 'OpenClaw' - this is a specific project name, not literal lobsters. | The original refers to running "OpenClaw" consuming tokens, not literally raising lobsters; this mistranslates the key term and usage. | The hypothesis provides multiple language translations when only English was requested. More critically, it mistranslates '龙虾' as 'Lobsters' when it should be 'OpenClaw' - a specific product name. The translation also misses the context that this is about token usage for a service. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: How many people have played OpenClaw yet? Content: What about the effect, how does a normal Linux server run? |
严重
[流畅性]
"What about the effect, how does a normal Linux server run?"
理由: Awkward phrasing. 'How does a normal Linux server run?' implies the server itself is running, rather than the software running on it. | Awkward phrasing; should be 'How is the performance? How does it run on a standard Linux server?' | Ungrammatical/unnatural phrasing; more natural would be "How is the performance, and how does it run on a normal Linux server?" |
| CohereLabs/tiny-aya-water | Title: How many people have played OpenClaw on strict rules? Content: What about the effect, how does a normal Linux server run? |
严重
[准确性]
"on strict rules"
理由: Added content not in source - 'on strict rules' is not in the original Chinese | Adds meaning not present in source; source just asks how many people have played OpenClaw. | Hallucination. The source text '玩过大龙虾 OpenClaw 了' means 'played with OpenClaw'. There is no mention of 'strict rules' in the source. | Adds 'on strict rules' which does not appear in source; misrepresents the meaning of the original title | The hypothesis incorrectly adds 'on strict rules' which is not present in the source or reference. |
| Qwen/Qwen2.5-14B-Instruct | Title: How many people have played OpenClaw on this site? Content: What's the effect like, how does it run on a regular Linux server? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: How many people have played Big Lobster with OpenClaw on this site Content: How does it work, how about running it on a regular Linux server |
严重
[准确性]
"Big Lobster with OpenClaw"
理由: Incorrect translation of '大龙虾 OpenClaw' which should be 'OpenClaw' without 'Lobster'. | Redundant and confusing translation. '大龙虾' is the slang name for OpenClaw; the translation treats them as two separate entities. | Source: “大龙虾 OpenClaw” is an app nickname; better rendered as “OpenClaw (Big Lobster)” or just “OpenClaw”. “played Big Lobster with OpenClaw” is inaccurate. | Awkward phrasing; should be 'OpenClaw' alone, not 'Big Lobster with OpenClaw' |
| Qwen/Qwen3-14B | Quante persone hanno giocato a OpenClaw (Granchio Gigante) su questo sito |
严重
[准确性]
"Granchio Gigante"
理由: Translation is in Italian instead of Chinese (target language zh) | Incomplete translation - only title translated, content missing entirely. Also '大龙虾' should be 'OpenClaw', not 'Granchio Gigante' (Giant Crab). | Adds "Giant Crab" which is not in source; also misinterprets "大龙虾"/OpenClaw, changing lobster metaphor to crab and adding size. | The hypothesis omits the Content section entirely. It also adds an unnecessary and potentially confusing literal translation '(Granchio Gigante)' for OpenClaw. |
| Qwen/Qwen3.5-4B | How many people have played with Big Claw on this site? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: How many people have played Big Lobster OpenClaw on this site? Content: How is the effect, how does it run on an ordinary Linux server? |
严重
[准确性]
"Big Lobster OpenClaw"
理由: 原文“大龙虾 OpenClaw”是戏称,参考只保留“OpenClaw”。译为“Big Lobster OpenClaw”会被理解为产品正式名,增加不存在的名称成分。 | Mistranslation: '大龙虾' is not a thing; should be 'OpenClaw' only | Misinterpreted '大龙虾' (Big Lobster) as part of the proper name or a literal descriptor, rather than recognizing it as the slang term for OpenClaw. | Hypothesis translates '大龙虾' as 'Big Lobster OpenClaw' but reference shows just 'OpenClaw' - '大龙虾' is a nickname for the product, not a literal descriptor
轻微
[流畅性]
"How is the effect"
理由: Unnatural phrasing for '效果咋样'; 'How is the performance' or 'How does it work' is more appropriate. | Hypothesis uses 'How is the effect' but reference uses 'How is the performance' - more natural phrasing | 英语中评价软件性能更自然的说法是“performance”或“how well does it work”,当前表达略显中式。 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | How many people have played OpenClaw on this site? |
轻微
[风格]
"on this site"
理由: Should be 'on this forum' to match reference | 'On this forum' or 'in this community' fits the context of '本站' better in a discussion board setting. | Slightly less natural than “on this forum” for the context but meaning is preserved |
| google/gemma-3-12b-it | How many people have played 大龙虾 OpenClaw on this site? |
严重
[准确性]
"How many people have played 大龙虾 OpenClaw on this site?"
理由: Unnecessary addition of '大龙虾' which is not in the source. | Incomplete translation. Should be 'How many people on this forum have played with OpenClaw?' The source uses '大龙虾' (slang for OpenClaw) which should be translated as just 'OpenClaw', not kept in Chinese. | Failure to translate or localize the slang '大龙虾', leaving Chinese characters in the English output. |
| google/gemma-3-1b-it | How does it look? How does it run on a standard Linux server? |
轻微
[准确性]
"How does it look?"
理由: Source asks about performance/effect ('效果咋样'), not visual appearance. | The translation is partial and does not cover the entire source text. | Incomplete translation. The source '效果咋样' asks about the performance/effect, which is better captured as 'How is the performance?' as seen in the reference. The hypothesis misses the context of the first question entirely. |
| google/gemma-3-4b-it | How many people have played Big Lobster on this site? |
严重
[准确性]
"Big Lobster"
理由: Refers to ‘OpenClaw’ product, not ‘Big Lobster’; key term mistranslated. | Failure to translate the proper noun/slang '大龙虾 OpenClaw'. 'Big Lobster' is nonsensical in English; it should be 'OpenClaw'. | Missing content translation. Should include 'How many people on this forum have played with OpenClaw?' and the content about performance and Linux server. |
| google/translategemma-12b-it | How many people have played OpenClaw on this site? |
严重
[准确性]
"How many people have played OpenClaw on this site?"
理由: The translation omits the content about performance and running on a Linux server. | Hypothesis only translates title but source has content about performance and Linux server. Incomplete translation. | Omits follow‑up question about performance and Linux server from source content; partial translation only. |
| google/translategemma-4b-it | How many people have played OpenClaw on this site? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '本站有多少人玩过大龙虾 OpenClaw 了' in en |
严重
[准确性]
"Translation of '本站有多少人玩过大龙虾 OpenClaw 了' in en"
理由: Missing actual translated content; also should preserve the product name OpenClaw and the sense of people on this site having used it. | Hypothesis is a placeholder string rather than an actual translation | The hypothesis is a meta-description string, not the actual translation. It completely fails to translate the source text. |
| tencent/HY-MT1.5-7B | "en": "Title: How many people on this site have played OpenClaw: Big Lobster?", "zh-TW": "Title: 多少人在这个网站上玩过OpenClaw: Big Lobster?", "ja": "Title: このサイトでOpenClaw: Big Lobsterをプレイした人は何人いるのか?", "ko": "Title: 이 사이트에서 OpenClaw: Big Lobster를 플레이한 사람은 몇 명인가요?", "de":="Title: Wie viele Menschen auf dieser Website haben OpenClaw: Big Lobster gespielt?", "fr":="Title: Combien de personnes sur ce site ont joué à OpenClaw: Big Lobster?", "it":="Title: Quante persone su questo sito hanno giocato a OpenClaw: Big Lobster?"}|im_end|> |
严重
[准确性]
"Big Lobster"
理由: Added 'Big Lobster' which is not in the source - this is an incorrect addition. | Failed to recognize '大龙虾' as part of the slang for 'OpenClaw', translating it literally. | Adds "Big Lobster" which is not present in source or reference; OpenClaw is the proper name and is already sufficient. | The hypothesis provides multiple language translations when only English was requested. More critically, it adds 'Big Lobster' as a subtitle when the reference shows the product is simply called 'OpenClaw'. The content translation is also completely missing. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Newbie, followed the tutorial by the in-site guy to set up sub2api site. Do I need any protection measures to publish as a public site? |
轻微
[准确性]
"to publish as a public site"
理由: 'Public site' misses the nuance of '公益站' (free/public welfare API proxy site). | Missing context 'free API proxy site' which is important for understanding the domain | Misses nuance that it is a free/public-welfare API proxy site, not just any public site. | Missing 'free API proxy' |
| CohereLabs/tiny-aya-water | Title: Newbie, followed the tutorial by the in-site guy to set up sub2api site, do I need any protection measures to publish as a public proxy? |
严重
[准确性]
"to publish as a public proxy"
理由: Slight inaccuracy; source indicates "公益站" (free/public-benefit site) specifically for API proxy, not just generic "public proxy". | The hypothesis incorrectly translates '公益站' as 'public proxy' instead of 'free API proxy site'. | The hypothesis only translates the Title and omits the Content entirely, which lists specific security measures already taken (Cloudflare Tunnels, Turnstile, etc.). |
| Qwen/Qwen2.5-14B-Instruct | Title: Newbie, followed the tutorials inside the site to set up a sub2api site, do I need any additional protection measures to release it as a free shared API site? Content: Currently done: Added cloudflare tunnels and Cloudflare Turnstile, registration with invitation codes or login via linuxdo. Seems like that's all; | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: Newbie, after following the tutorial from the station elder to set up sub2api site, what protective measures do I need to take if I want to publish it as a public proxy? \nContent: What have been done so far: added cloudflare tunnels and Cloudflare Turnstile, invite code registration or linuxdo login. Seems like that’s all;;", "zh-TW": "標題:新手,依照站內長的教學設定 sub2api 站點,如果要發佈為公益站還需要做哪些保護措施?\n內容:目前已完成:加入 cloudflare tunnels 和 Cloudflare Turnstile,使用邀請碼註冊或 linuxdo 登入。看來就這些了;;", "ja": "タイトル:初心者、ステーションの長に教わってsub2apiサイトを作成しましたが、公益サイトとして公開するにはどのような保護措置が必要ですか?\n内容:既に行われたこと:Cloudflare Tunnels と Cloudflare Turnstile を追加し、招待コードで登録または Linuxdo ログイン。これくらいか;;", "ko": "제목: 초보자, 스테이션 선배의 가이드를 따라 sub2api 사이트를 설정했습니다만, 공익站在这里不能直接翻译为韩语,应保持术语的一致性,因此保留“公益站”作为关键词。 \n内容:已完成事项:添加了 cloudflare tunnels 和 Cloudflare Turnstile,使用邀请码注册或 Linuxdo 登录。似乎就是这样了;;", "de": "Titel: Neuling, nachdem ich den Tutorial von der Station-Oberen für die sub2api-Station gemacht habe, welche Schutzmaßnahmen muss ich ergreifen, um sie als Public Proxy zu veröffentlichen?\nInhalt: Bislang getroffene Maßnahmen: Cloudflare Tunnels und Cloudflare Turnstile hinzugefügt, Einladungscode-Registrierung oder Linuxdo-Anmeldung. Scheint alles zu sein.;", "fr": "Titre : Nouveau venu, après avoir suivi le tutoriel du chef de la station pour mettre en place une sub2api, quels mesures de protection dois-je prendre si je veux la publier comme une station d'utilité publique ?\nContenu : Ce qui a été fait jusqu'à présent : ajouté les tunnels Cloudflare et Cloudflare Turnstile, inscription avec un code d'invitation ou connexion via Linuxdo. Semble que c'est tout ;;", "it": "Titolo: Nuovo arrivato, dopo aver seguito il tutorial del capostazione per configurare la sub2api, quali misure di protezione devo prendere se voglio pubblicarla come una statione di utilità pubblica?\nContenuto: Finora fatto: sono stati aggiunti i tunnel Cloudflare e Cloudflare Turnstile, registrazione con codice di invito o accesso tramite Linuxdo. Pare che sia tutto ;;" } | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: Newbie, followed the tutorial from the site veterans to create a sub2api site. Do I need to do any additional protective measures to publish it as a free shared API site? Content: What's already been done: added cloudflare tunnels and Cloudflare Turnstile, invitation code registration or linuxdo login. Seems like that's all;; | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Newbie, I followed tutorials from someone on this site to set up a sub2api site. What additional protections do I need to implement before publishing as a free shared API site? Content: Currently implemented: Added Cloudflare Tunnels and Cloudflare Turnstile, invitation-code registration or login via linuxdo. Seems like that's all; | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Newcomer, followed the tutorial from an experienced user on the site and created a sub2api site. What additional protection measures are needed before publishing as a free shared API site? Content: Already implemented: added Cloudflare Tunnels and Cloudflare Turnstile, registration via invitation code or LinuxDo login. Seems like that's all; |
轻微
[术语]
"free shared API site"
理由: 原文“公益站”在语境中多指“免费中转/公益中转站”,参考为“free API proxy site”,当前译法略弱化“代理/中转”含义。 | Should be 'free API proxy site' or '公益站' | Hypothesis uses 'free shared API site' but reference uses 'free API proxy site' - more accurate terminology | '公益站' in this context is better translated as 'free API proxy site' or 'community free station' to match the reference and common usage, though 'shared' is not entirely wrong.
轻微
[术语]
"LinuxDo"
理由: 参考中为“LINUX DO”,保持原有品牌写法更一致,但属轻微。 | Hypothesis capitalizes as 'LinuxDo' but reference uses 'LINUX DO' - inconsistent with source capitalization | Should be 'LINUX DO' to match source capitalization |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | New user following internal guide to create sub2api site, need additional protection measures if making it public? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Newbie, followed the tutorials from experienced users within the site and created a sub2api site. What protective measures do I need to take to release it as a public proxy? Content: Currently done: Added Cloudflare Tunnels and Cloudflare Turnstile, invite code registration or linuxdo login. Seems like that’s all; | 通过 (无共识错误) |
| google/gemma-3-1b-it | Currently implemented: added Cloudflare tunnels and Cloudflare Turnstile, registration codes or Linux do login. Seems like it's missing; ; |
轻微
[术语]
"registration codes or Linux do login"
理由: 'Linux do' should be 'LINUX DO' (proper noun/platform name). Capitalization and spacing are incorrect. | 'linuxdo登录' refers to 'LinuxDo login' (site/platform name); 'Linux do' spacing is slightly off but understandable. | The term 'Linux do' should be 'LINUX DO' to match the reference and maintain proper capitalization.
轻微
[流畅性]
"Seems like it's missing; ;"
理由: Awkward phrasing. 'That seems to be it' or 'Seems like that's all' is more natural than 'Seems like it's missing' for '好像没了'. | Slightly awkward phrasing 'Seems like it's missing' - could be 'That seems to be it' to match reference | Ungrammatical phrasing; natural English would be 'That seems to be it;;' or similar. | Double semicolon ';;' is awkward. Should be single semicolon or different punctuation. |
| google/gemma-3-4b-it | Title: New users followed the tutorial of inner circle and built a sub2api site, do I need to take any protective measures to publish it as a public proxy site? |
轻微
[术语]
"public proxy site"
理由: '公益站' specifically refers to a free/community-run API proxy. 'Public proxy site' is acceptable but 'free API proxy site' captures the 'public welfare/free' nuance better. | Should be 'free API proxy site' to match the reference and better convey the meaning of '公益站'. | The translation is slightly less accurate than the reference. |
| google/translategemma-12b-it | Title: Newbie - I created a sub2api site following tutorials from experienced users within the forum. What protective measures do I need to take before publishing it as a public service? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: New user, followed the tutorial from inner experts and set up a sub2api site, is there any protection measures needed when publishing as a public API site? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 新人,跟着站内佬的教程做了sub2api站点,要发布为公益站还得做什么保护措施吗?' in en |
严重
[准确性]
"Translation of 'Title: 新人,跟着站内佬的教程做了sub2api站点,要发布为公益站还得做什么保护措施吗?' in en"
理由: The hypothesis is a meta-description string, not the actual translation. It fails to translate the title and ignores the content entirely. | No real translation is given; omits details like being a newbie, following in-forum experts’ tutorials, and releasing as a public/free site. | Hypothesis is a placeholder string rather than an actual translation |
| tencent/HY-MT1.5-7B | "en": "Title: New users have created sub2api sites following the tutorials on the platform. What additional protective measures are needed to make them public welfare sites?", "zh-TW": "标题:新用户已按照平台上的教程创建了sub2api网站。要将其设为公益网站,还需要采取哪些防护措施?", "ja": "タイトル:新規ユーザーがプラットフォームのチュートリアルに従ってsub2apiサイトを作成しました。これらを公益サイトとして公開するためには、どのような追加的な保護措置が必要でしょうか?", "ko": "제목: 신규 사용자들이 플랫폼의 가이드에 따라 sub2api 사이트를 만들었습니다. 이를 공익 사이트로 설정하기 위해서는 어떤 추가적인 보호 조치가 필요한가요?", "de":="Titel: Neue Nutzer haben Sub2api-Sites nach den Anleitungen auf der Plattform erstellt. Welche zusätzlichen Schutzmaßnahmen sind erforderlich, um sie zu gemeinnützigen Websites zu machen?", "fr":="Titre : Les nouveaux utilisateurs ont créé des sites Sub2api en suivant les tutoriels de la plateforme. Quelles mesures de protection supplémentaires sont nécessaires pour les transformer en sites à but non lucratif ?", "it":="Titolo: I nuovi utenti hanno creato siti Sub2api seguendo i tutorial della piattaforma. Quali ulteriori misure di protezione sono necessarie per renderli siti a scopo non profit?"}|im_end|> |
严重
[准确性]
"public welfare sites"
理由: The hypothesis provides multiple language translations when only English was requested. The translation also misses the informal tone ('佬' = 'bros') and the specific context of 'linuxdo login' which should be preserved. The content is completely missing. | Mistranslation of '公益站' in this technical context. It refers to a 'free API proxy site' for the community, not a charitable/non-profit organization website. | "公益站" here means a free API proxy site for public use, not a general charitable/non-profit website; the purpose (free API proxy) is lost. | The hypothesis contains multiple language translations mixed together instead of providing a single target translation. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Open Source】A Guide to Raisin Crabs, Even if you used other models than OPS 4.6, ClawPulse helps you know what your raisin crabs are doing behind the scenes |
严重
[准确性]
"Even if you used other models than OPS 4.6"
理由: Incorrect model name. Should be 'Opus 4.6'. | Should be 'Opus 4.6' not 'OPS 4.6' | Source contrasts Opus 4.6 with other models; spelling "OPS" is wrong and "even if" changes nuance slightly but main problem is name error.
轻微
[准确性]
"helps you know what your raisin crabs are doing behind the scenes"
理由: Propagates the incorrect 'raisin crabs' term instead of OpenClaw; also omits monitoring whether backend is working. | Hallucinated translation. '养虾' (raising shrimp/OpenClaw) became 'Raisin Crabs'. | Incorrect translation of '龙虾' and missing context about 'monitoring' | Should be 'ClawPulse helps you know exactly what your OpenClaw is doing' not 'raisin crabs' |
| CohereLabs/tiny-aya-water | Title: 【Open Source】A Guide to Raisin Shrimp Farming, Even if you used other models than OPSUM 4.6, ClawPulse will help you know what your raisin shrimp is doing behind the scenes |
致命
[准确性]
"A Guide to Raisin Shrimp Farming"
理由: Mistranslates '养龙虾' (running OpenClaw) as 'Raisin Shrimp Farming'; 'raisin' is nonsensical | Severe mistranslation. '养虾' refers to 'running OpenClaw' (Lobster/Shrimp slang for the model). 'Raisin Shrimp' is nonsensical and likely a confusion with 'raisin' (perhaps misreading characters) or a hallucination. | The hypothesis incorrectly translates '养虾指南' as 'Raisin Shrimp Farming' instead of 'OpenClaw deployment guide'. | Completely distorts meaning of "养虾指南" which refers to running OpenClaw, not literal or "raisin" shrimp farming. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Open Source] Lobster Raising Guide, No Fear Even When Using Other Models, ClawPulse Helps You Understand What Your Lobsters Are Up To Content: Those raising lobsters must have encountered several issues, especially when you didn't use the opus 4.6 model but another one... openclaw monitor was created to keep an eye on whether the backend is actually working or not. |
严重
[准确性]
"Lobster Raising Guide"
理由: The main subject is OpenClaw deployment/usage; literal “raising lobsters” loses the product/tool nuance indicated in the reference. | Should be 'OpenClaw deployment guide' not 'Lobster Raising Guide'. '龙虾' is slang for OpenClaw. | Fails to translate the slang '养虾' (deploying OpenClaw) correctly, using a literal animal husbandry interpretation. |
| Qwen/Qwen2.5-7B-Instruct | Title: [Open Source] Shrimp Guide, No Worries Even if You Use Other Models, ClawPulse Tells You What Your Crayfish Are Up To Content: Friends who breed crayfish have definitely encountered some issues, especially when you didn’t use the opus 4.6 model but used other models instead... OpenCLAW Monitor was created to check whether the backend is actually working or not. |
严重
[准确性]
"crayfish"
理由: Inconsistent terminology. Uses 'Shrimp' in title and 'Crayfish' in content for the same source term '龙虾' (OpenClaw slang). | “龙虾” in this community context is OpenClaw, not actual crayfish/lobster animals. | Incorrect translation of '龙虾' which should be 'OpenClaw' in this context. |
| Qwen/Qwen3-14B | 【Open Source】Guida per l'allevamento dei gamberi, anche con i modelli è ok, ClawPulse ti dice cosa sta facendo il tuo gambero |
严重
[准确性]
"Guida per l'allevamento dei gamberi"
理由: Source is about "OpenClaw deployment" (running a system), not literally shrimp farming; metaphor should stay tied to OpenClaw. | Translation is in Italian instead of Chinese (target language zh) | The hypothesis omits the Content section. It mistranslates '养虾' (running OpenClaw) as 'breeding shrimp' (allevamento dei gamberi) and fails to translate 'OpenClaw' consistently, using literal 'gambero' instead. | Incorrect translation of '养虾指南'. Should be 'OpenClaw deployment guide' not 'Guida per l'allevamento dei gamberi' (Shrimp farming guide). Also missing 'ClawPulse' context. |
| Qwen/Qwen3.5-4B | Title: [Open Source] A guide to raising shrimp, even if you use ClawPulse models, you won't worry about what your lobster is doing. Content: Shrimp farmers definitely have encountered several issues, especially when you don't use the opus 4.6 model but instead use other models... openclaw monitor was created specifically to monitor whether the background process is actually running. |
严重
[准确性]
"ClawPulse models"
理由: Misinterprets the sentence structure. The source says 'even if you use [bad] models, ClawPulse helps...', not 'even if you use ClawPulse models'. | Unnecessary addition of 'models'. | Mistranslation; should be 'bad model' not 'ClawPulse models' |
| Qwen/Qwen3.5-9B | Title: [Open Source] Shrimp Raising Guide, No Fear Using Large Models, ClawPulse Helps You Know What Your Lobster Is Actually Doing Content: Friends who raise lobsters have surely encountered several issues, especially when you use models other than opus 4.6... |
严重
[准确性]
"Shrimp Raising Guide"
理由: Mistranslation: '养虾指南' should be 'OpenClaw deployment guide', not literal shrimp raising | Hypothesis translates '养虾指南' as 'Shrimp Raising Guide' but reference shows 'OpenClaw deployment guide' - '养虾' is slang for running OpenClaw, not literal shrimp | 原文“养虾指南”在语境中指 OpenClaw/OpenClaw 部署,与虾并非真实养殖;参考明确为“OpenClaw deployment guide”,直译造成实义偏差。 | Failed to identify '养虾' as the slang for deploying/running OpenClaw, translating it literally as raising shrimp. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Translation of '龙虾' in en |
致命
[准确性]
"Translation of '龙虾' in en"
理由: Severe mistranslation: This is a title that should translate the actual content, not describe the translation task. Failed to translate '养虾指南' (OpenClaw deployment guide) and 'ClawPulse' | Not a translation of the source at all; it is a meta request phrase unrelated to the original content about OpenClaw deployment and monitoring | The hypothesis is a meta-comment describing the task rather than providing the actual translation of the source text. | Hypothesis is incomplete and not a proper translation. Should translate the full title and content about OpenClaw deployment guide | The translation is missing and does not provide the correct term 'OpenClaw'. |
| google/gemma-3-12b-it | Title: [Open Source] Lobster Farming Guide - Don't Be Afraid to Use LLMs, ClawPulse Tells You What Your Lobsters Are Up To Content: Lobster farmers have definitely encountered several issues, especially when using models other than opus 4.6... |
严重
[准确性]
"Lobster Farming Guide"
理由: Source refers to OpenClaw deployment/usage; "Lobster Farming" over-literalizes the metaphor and drops the explicit OpenClaw reference | Literal translation of '养虾' (raising shrimp/lobsters) instead of the intended meaning 'OpenClaw deployment guide'. | '养龙虾' uses slang '龙虾' (OpenClaw), should be 'OpenClaw deployment guide' not 'Lobster Farming Guide'.
严重
[准确性]
"Don't Be Afraid to Use LLMs"
理由: Should be 'Don't worry if you use a bad model' not 'Don't be afraid to use LLMs' | Source says "用了拉模型也不怕" (using a bad/cheap model); "LLMs" changes the meaning from model quality to model type | Mistranslation of '用了拉模型也不怕' (don't be afraid if you use a bad/slow model). 'LLMs' is too generic and misses the negative connotation of '拉' (bad/poor quality). |
| google/gemma-3-1b-it | Open Claw Monitor is designed to monitor background tasks and provide insights into what your shrimp are doing. |
轻微
[准确性]
"provide insights into what your shrimp are doing"
理由: Adds 'provide insights' which is not in the source; source focuses on monitoring whether it is working in the background. | The source specifically asks about monitoring 'whether it is actually working' (到底在没在工作), while the hypothesis generalizes this to 'provide insights', losing the specific intent of checking operational status. | Incomplete translation. Missing key context about '养龙虾的佬友们' (bros running OpenClaw), model issues, and the specific purpose of monitoring whether it's working in the background. | 'shrimp' should be 'OpenClaw' - the source refers to OpenClaw (the AI tool), not actual shrimp. Also missing the context of 'whether it is actually working in the background' vs just 'monitor background tasks'
轻微
[术语]
"what your shrimp are doing"
理由: The term 'shrimp' is incorrectly used instead of 'OpenClaw'. | 'shrimp' is a literal rendering; context is OpenClaw metaphor for 'lobster/OpenClaw instance'. Could be more precise but understandable. | Inconsistent terminology. The source '龙虾' and context refer to 'OpenClaw' (often slangily called Lobster), but the hypothesis switches to 'shrimp', losing the specific project reference. | 'Open Claw' should be 'OpenClaw' (one word) to match the proper name |
| google/gemma-3-4b-it | 【Open Source】Shrimp Raising Guide, Even if you use the Claw model, don’t worry! ClawPulse helps you know what your shrimp is doing. |
严重
[准确性]
"Shrimp Raising Guide"
理由: Mistranslation of '养虾' (raising shrimp). Contextually refers to 'OpenClaw deployment/usage'. 'Shrimp' is also the wrong animal (Source: 龙虾/Lobster, Hypothesis: Shrimp). | Should be 'OpenClaw deployment guide' not 'Shrimp Raising Guide'. '养虾' is slang for running OpenClaw. | ‘养虾/龙虾’ refers to running OpenClaw, not literal shrimp raising; product context lost.
严重
[准确性]
"Even if you use the Claw model, don’t worry!"
理由: Source says 'opus 4.6 模型' (Opus 4.6 model). Hypothesis incorrectly changes this to 'Claw model'. | Mistranslation: '用了拉模型' means 'using a bad/poor model' not 'using the Claw model' | Source says using a bad/other model, not ‘Claw model’; reverses nuance. | Should be 'use a bad model' not 'use the Claw model'. The source says '用了拉模型' which means using a poor/bad model. |
| google/translategemma-12b-it | Title: [Open Source] Shrimp Farming Guide: Don't be afraid to use LLMs, ClawPulse helps you understand what your shrimp is doing. | 通过 (无共识错误) |
| google/translategemma-4b-it | "en": "Title: [Open Source] Shrimp Farming Guide, Even if you don't use the La model, ClawPulse will tell you what your shrimp is doing.", "zh-TW": "標題:[開源] 養蝦指南,即使您沒有使用La模型,ClawPulse也能告訴您您的蝦子在做什麼。", "ja": "タイトル:[オープンソース] スクリューの飼育ガイド、opus 4.6モデル以外のモデルを使用する場合でも、ClawPulseがあなたのスクリューが何をしているかを教えてくれます。", "ko": "제목: [오픈 소스] 새우 사육 가이드, opus 4.6 모델 외 다른 모델을 사용하는 경우에도 ClawPulse가 당신의 새우가 무엇을 하는지 알려줍니다.", "de": "Titel: \[Open Source] Anleitung zur Schalentierhaltung, auch wenn Sie kein opus 4.6 Modell verwenden, hilft Ihnen ClawPulse zu verstehen, was Ihre Garnelen tun.", "fr": "Titre : \[Open Source] Guide à l'élevage de crevettes, même si vous n'utilisez pas le modèle opus 4.6, ClawPulse vous informera de ce que font vos crevettes.", "it": "Titolo: \[Open Source] Guida all'allevamento di gamberi, anche se non si utilizza il modello opus 4.6, ClawPulse vi informerà su ciò che fanno i vostri gamberi."} <|im_end|> | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '开源' in en |
严重
[准确性]
"Translation of '开源' in en"
理由: The hypothesis is a meta-description string, not the actual translation. It fails to translate the title and content, missing the context of 'OpenClaw' and 'ClawPulse'. | Hypothesis is a placeholder string rather than an actual translation. Only translates a single word '开源' without context from the full title | Does not actually translate the term; answer should be 'open source' and ideally reflect its bracketed use in the title context. |
| tencent/HY-MT1.5-7B | "en": "Title: [Open Source] Shrimp Farming Guide: Don’t Worry Even If You Use a Different Model – ClawPulse Helps You Understand What Your Lobsters Are Doing Exactly", "zh-TW": "标题:[開源] 鰻魚養殖指南:即使使用其他模型也无需擔心——ClawPulse能幫您了解您的龍蝦究竟在做什么", "ja": "タイトル:[オープンソース] エビの養殖ガイド:別のモデルを使用しても心配無用——ClawPulseがあなたのロブスターが実際に何をしているかを教えてくれます", "ko": "제목:[오픈소스] 새우 양식 가이드: opus 4.6 모델을 사용하지 않아도 걱정 없음 — ClawPulse가 당신의 로브스터가 정확히 무엇을 하고 있는지 알려줍니다", "de":="Titel: [Offene Quelle] Garnelenzuchtanleitung: Keine Sorge, auch wenn Sie ein anderes Modell verwenden – ClawPulse hilft Ihnen dabei herauszufinden, was Ihre Robben genau tun", "fr":="Titre : [Open Source] Guide à l’élevage de crevettes : Ne vous inquiétez pas même si vous utilisez un autre modèle – ClawPulse vous aide à comprendre ce que vos homards font réellement", "it":="Titolo: [Open Source] Guida all’allevamento di gamberi: Non preoccupatevi nemmeno se utilizzate un modello diverso – ClawPulse vi aiuta a capire esattamente cosa fanno i vostri aragoste"} |
致命
[准确性]
"[Open Source] Shrimp Farming Guide: Don't Worry Even If You Use a Different Model – ClawPulse Helps You Understand What Your Lobsters Are Doing Exactly"
理由: Wrong entity: 'Shrimp' and 'Lobsters' should be 'OpenClaw' - this is a specific project name, not literal shrimp/lobsters. | Failed to identify '养虾' as slang for deploying 'OpenClaw', translating it literally as shrimp farming. | The hypothesis provides multiple language translations when only English was requested. More critically, it mistranslates '龙虾' as 'Lobsters' when it should be 'OpenClaw'. It also translates '养龙虾' as 'Shrimp Farming' when the reference shows it should be 'OpenClaw deployment'. The content is completely missing. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Claude Code is not reliable |
严重
[准确性]
"Claude Code is not reliable"
理由: Overly generic; omits key meaning about quickly hitting the Pro daily limit and issues with token quota. | Severe under-translation. The source discusses exhausting limits quickly ('难蚌', '不耐用'), not general reliability. Misses the specific context of the Title. | Doesn't capture the main point about daily limit | Completely inaccurate title; should be 'Speechless, Claude Code blew through the Pro subscription's daily limit in half an hour' | Incomplete translation - missing Content entirely and Title is oversimplified |
| CohereLabs/tiny-aya-water | Title: Hard to believe, Claude Code completed half an hour ago the daily limit for Pro subscription |
严重
[流畅性]
"completed half an hour ago the daily limit"
理由: Awkward phrasing - 'completed' should be 'exhausted' or 'blew through'; missing 'in' before 'half an hour' | Awkward phrase order; should be "blew through the daily limit in half an hour" or similar for natural English. | Awkward phrasing; should be 'blew through' or 'exhausted' the daily limit, not 'completed' |
| Qwen/Qwen2.5-14B-Instruct | Title: Wow, Claude Code finished the daily limit of Pro subscription in half an hour Content: Claude Code this is too inefficient. I also subscribed to Cursor's old billing method, running over a billion tokens per month isn't a problem... It seems like max subscription is necessary for unrestricted use. |
轻微
[风格]
"Wow"
理由: '难蚌' is internet slang for 'speechless' or 'hard to hold back laughter', 'Wow' does not capture the nuance. | Reference uses 'Speechless' (难蚌) which is a specific internet slang expression, not generic 'Wow' | Adds an interjection not present in source; minor stylistic embellishment.
轻微
[准确性]
"too inefficient"
理由: “不耐用” is closer to ‘not durable/lacking endurance’; “inefficient” slightly shifts nuance from consumption rate to performance efficiency. | The source '不耐用' refers to durability/endurance of the quota, not efficiency of the code. | Should be 'lacking in endurance' (不耐用) not 'inefficient' | Should be 'is way too lacking in endurance' - better captures the meaning of '太不耐用了' |
| Qwen/Qwen2.5-7B-Instruct | Title: Nambang, Claude Code finished the Pro subscription's daily limit in half an hour Content: Claude Code is really not durable. I subscribed to Cursor myself, and their old billing method allows running over a billion tokens per month... | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Difficile Bān, Claude Code ha raggiunto il limite giornaliero dell'abbonamento Pro in mezz'ora |
严重
[准确性]
"Difficile"
理由: The hypothesis omits the Content section. It mistranslates '难蚌' (slang for speechless/cannot hold back laughter) as 'Difficile Bān', which is nonsensical in Italian. | Translation is in Italian instead of Chinese (target language zh) | Even ignoring "Bān", "Difficile" does not convey the exasperated/speechless nuance of the slang. | Incorrect translation of '难蚌'. Should be 'Speechless' not 'Difficile Bān'. '难蚌' is internet slang meaning 'hard to accept/speechless'. |
| Qwen/Qwen3.5-4B | Title: Stuck! Claude Code finished Pro's daily quota in half an hour Content: Claude Code is also too unreliable. I'm subscribed to Cursor myself; their old billing model works fine for running billions of tokens per month... To use Claude Code properly, you'll need the Max subscription. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Hard to swallow, Claude Code finished its daily Pro subscription limit in half an hour Content: Claude Code is just not durable enough. I also subscribe to Cursor; with Cursor's old billing method, running over a billion tokens per month was no problem... If you want to use Claude Code comfortably, you'll have to go for the max subscription. |
轻微
[风格]
"Hard to swallow"
理由: Incorrect idiom translation for '难蚌' (slang for 'speechless' or 'ridiculous', derived from 'nan beng'). 'Hard to swallow' implies disbelief in a different sense. | Should be 'Speechless' to match '难蚌' internet slang | “难蚌/难绷”在网语中更接近“无语”“无奈”;参考用“Speechless”。“Hard to swallow”语气偏严肃,略有语域偏差。 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Clam Shell, Claude Code can finish daily subscription limits in half an hour Content: Clam Shell is really not worth it. I subscribed to Cursor too. Cursor's old billing method allows a billion tokens each month without any issues... It seems that Claude Code wants to be as good as possible with max subscription. |
致命
[准确性]
"Clam Shell"
理由: “难蚌” is slang meaning speechless/awkward, not literally clam shell | Completely wrong translation of '难蚌' (internet slang for 'speechless' or 'can't hold back laughter', literally 'hard clam' but used idiomatically). It is not referring to a clam shell. | '难蚌' is slang meaning 'speechless', not 'clam shell'. Should be 'Speechless, Claude Code blew through the Pro subscription's daily limit in half an hour' |
| google/gemma-3-12b-it | Title: Difficult Clams, Claude Code finished the Pro subscription daily limit in half an hour Content: Claude Code is really not durable. I also subscribe to Cursor, and Cursor’s old billing method can easily handle billions of tokens per month... It seems like if you want to enjoy Claude Code, you need to upgrade to the max subscription. |
严重
[准确性]
"Difficult Clams"
理由: Complete hallucination/mistranslation of '难蚌' (slang for 'speechless' or 'hard to hold back laughter', pun on 'nan beng'). 'Clams' is unrelated. | '难蚌' is an internet slang expression meaning 'speechless/at a loss for words', not literally 'difficult clams'. Should be 'Speechless'. | "难蚌" is slang meaning "speechless/awkward"; literal "Difficult Clams" is confusing and incorrect in English context |
| google/gemma-3-1b-it | The title is difficult to squeeze out of Claude. It took about half an hour to complete. I also subscribed to Cursor, and their billing system is very expensive. Each month, running up to one billion tokens is no problem...Claude wants to have fun, so he's going to max out his subscription. |
严重
[准确性]
"The title is difficult to squeeze out of Claude."
理由: '难蚌' is slang meaning 'speechless/awkward', not 'difficult to squeeze'. Completely mistranslated. | The translation contains several inaccuracies and awkward phrasing. | Complete hallucination/mistranslation. '难蚌' is internet slang for 'speechless' or 'hard to hold back laughter' (similar to 'lmao' or 'unbelievable'). The hypothesis invents a meaning about squeezing titles. | Major mistranslation - '难蚌' means 'speechless' (difficult to believe), not 'difficult to squeeze out'. The entire title translation is wrong. Content also has errors: 'very expensive' should be 'not a problem', 'wants to have fun' should be 'wants to use freely' | Completely inaccurate. '难蚌' is slang meaning 'speechless/unbelievable', not 'difficult to squeeze out of'. The sentence structure is wrong. |
| google/gemma-3-4b-it | Ugh, Claude Code finished its daily subscription limit in just half an hour! |
轻微
[风格]
"Ugh"
理由: '难蚌' is internet slang for 'speechless' or 'hard to hold back laughter/cringe'. 'Ugh' expresses disgust, which is slightly off. | Style: '难蚌' is a Chinese internet slang meaning 'speechless' or 'hard to swallow', 'Ugh' is too casual | Adds informal interjection not present in source; minor stylistic deviation. |
| google/translategemma-12b-it | Title: Difficult Clam, Claude Code exceeded the daily limit of the Pro subscription in just half an hour. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Annoying, Claude Code finished its daily subscription limit in half an hour |
轻微
[风格]
"Annoying"
理由: '难蚌' is internet slang for 'speechless' or 'hard to hold back laughter', 'Annoying' misses the nuance. | Translation of '难蚌' (speechless/cringe) is too weak; 'Speechless' or 'Unbelievable' would be more accurate | Conveys complaint but loses slang nuance of “难蚌/无语”; closer to “speechless” would match style. |
| tencent/HY-MT1.5-1.8B | Translation of '难蚌,Claude Code半小时就蹬完了Pro订阅的每日限额' in en |
严重
[准确性]
"Translation of '难蚌,Claude Code半小时就蹬完了Pro订阅的每日限额' in en"
理由: Hypothesis is a placeholder string rather than an actual translation | Provides only a meta-phrase; fails to translate slang '难蚌' (speechless/awkward) and the idea of exhausting the Pro daily limit in half an hour. | The hypothesis is a meta-description string, not the actual translation. It fails to translate the slang '难蚌' or the rest of the text. |
| tencent/HY-MT1.5-7B | "en": "Title: Hard-shelled clams; Claude Code finished using up the daily limit for Pro subscriptions in just half an hour.", "zh-TW": "标题:硬壳蛤蜊;Claude Code在半小时内就用完了Pro订阅的每日额度!", "ja": "タイトル:硬殻のカキ;Claude Codeはわずか30分でProサブスクリプションの日額限度を使い切った!", "ko": "제목: 단단한 껍데기의 조개들; Claude Code는 30분 만에 Pro 구독의 일일 한도를 모두 사용했다!", "de":="Titel: Hartschalige Muscheln; Claude Code hat die tägliche Grenze für Pro-Abonnements bereits nach einer halben Stunde aufgebraucht.", "fr":="Titre : Moules à coquille dure ; Claude Code a épuisé la limite quotidienne des abonnements Pro en seulement une demi-heure.", "it":="Titolo: Conchiglie a guscio duro; Claude Code ha esaurito il limite giornaliero degli abbonamenti Pro in soli 30 minuti."}|im_end|> |
轻微
[准确性]
"Hard-shelled clams"
理由: Mistranslation of the slang '难蚌' (nan beng, meaning speechless/ridiculous) as a literal description of clams. | The hypothesis provides multiple language translations when only English was requested. More critically, it mistranslates '难蚌' (slang expression meaning 'speechless/unbelievable') as 'Hard-shelled clams', which is a literal but incorrect interpretation. The reference shows it should be 'Speechless'. The content is completely missing. | "难蚌" is internet slang meaning "speechless/awkward"; literal translation to clams loses the idiomatic meaning. | Mistranslation: '难蚌' (nán bèng) is internet slang meaning 'hard to believe' or 'speechless' (from '难绷'), not literal 'hard-shelled clams'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Open Source】CCG v1.7.55 : Claude Code Orchestration CLI Collaboration | Codex + Gemini + Claude |
轻微
[流畅性]
"Claude Code Orchestration CLI Collaboration"
理由: Should be 'Claude Code orchestrates 3-CLI collaboration' for better clarity and accuracy | Unnatural phrasing; clearer would be "Claude Code orchestrates 3-CLI collaboration" as in reference. | Noun stacking makes the phrase clunky. 'Orchestrates 3-CLI collaboration' (reference) is clearer. |
| CohereLabs/tiny-aya-water | This feature is only available on desktop version. |
致命
[准确性]
"This feature is only available on desktop version."
理由: The hypothesis does not match the source or reference content. | Hypothesis is completely unrelated to source content about CCG v1.7.55 and Claude Code orchestration | Complete hallucination. The text is unrelated to the source, which describes an open-source workflow tool (CCG) for multi-model collaboration. | Completely unrelated translation - source is about CCG v1.7.55 tool, hypothesis is about desktop version feature | Entirely unrelated to source about CCG v1.7.55 open-source tool; no shared meaning. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Open Source] CCG v1.7.55 : Claude Code Arrangement Three CLI Collaboration | Codex + Gemini + Claude Content: GitHub : GitHub - fengshao1227/ccg-workflow: Multi-model collaboration development toolkit - Based on Claude Code CLI... |
严重
[流畅性]
"Arrangement Three CLI Collaboration"
理由: Ungrammatical/unnatural phrasing; should be something like “orchestrates 3-CLI collaboration”, harming clarity. | Awkward phrasing. 'Orchestrates 3-CLI collaboration' is much more natural and accurate for '编排'. | Should be 'Claude Code orchestrates 3-CLI collaboration' - awkward phrasing in hypothesis |
| Qwen/Qwen2.5-7B-Instruct | Title: [Open Source] CCG v1.7.55 : Claude Code Arrangement Three CLI Collaboration | Codex + Gemini + Claude Content: GitHub : GitHub - fengshao1227/ccg-workflow: A set of multi-model collaborative development tools - based on Claude Code CLI... |
严重
[流畅性]
"Arrangement Three CLI Collaboration"
理由: Unnatural and stiff translation of '编排三 CLI 协作'. 'Orchestrates 3-CLI collaboration' (as in reference) is much smoother. | Unnatural phrase; should be “orchestrates 3 CLI tools” or similar. “Arrangement Three” is incorrect English syntax. | Should be 'orchestrates' not 'Arrangement' - better captures the technical meaning |
| Qwen/Qwen3-14B | Title: [Open Source] CCG v1.7.55 : Claude Code Orchestration for Three CLI Collaborations | Codex + Gemini + Claude | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Open Source] CCG v1.7.55: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude Content: GitHub: GitHub - fengshao1227/ccg-workflow: Multi-model collaborative development toolkit - Based on Claude Code CLI... |
轻微
[流畅性]
"Orchestration Three CLI Collaboration"
理由: Should be 'orchestrates 3-CLI' for better grammar and flow | Awkward phrasing. 'Orchestrates 3-CLI collaboration' (as in reference) is more natural. | Unnatural phrase order; should be ‘orchestrates three-CLI collaboration’ or similar, though meaning can be inferred. |
| Qwen/Qwen3.5-9B | Title: [Open Source] CCG v1.7.55: Claude Code Orchestration for Three CLI Collaborations | Codex + Gemini + Claude Content: GitHub: GitHub - fengshao1227/ccg-workflow: Multi-model collaborative development toolkit - Based on Claude Code CLI... |
轻微
[流畅性]
"orchestration for Three CLI Collaborations"
理由: Slightly awkward phrasing compared to 'orchestrates 3-CLI collaboration', but meaning is preserved. | 结构别扭且重复,“orchestrates 3-CLI collaboration”更自然。当前表达仍可理解,属轻微流畅度问题。 | Unnecessary addition. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Open Source] CCG v1.7.55: Claude Code Compilation with Three CLI Collaborations | Codex + Gemini + Claude | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [Open Source] CCG v1.7.55: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude Content: GitHub: GitHub - fengshao1227/ccg-workflow: A multi-model collaborative development toolkit - Based on Claude Code CLI... |
严重
[流畅性]
"Orchestration Three CLI Collaboration"
理由: Should be 'orchestrates 3-CLI' for better fluency and accuracy. The phrasing is awkward. | Unnatural phrasing. 'Orchestrates 3-CLI collaboration' or 'Three-CLI Orchestration' is grammatically superior. | Unnatural phrase; should be "orchestrates three CLI tools" or "3-CLI collaboration" as in reference |
| google/gemma-3-1b-it | Translation of '【开源】CCG v1.7.55 : Claude Code 编排三 CLI 协作 | Codex + Gemini + Claude' in en |
严重
[准确性]
"Translation of '【开源】CCG v1.7.55 : Claude Code 编排三 CLI 协作 | Codex + Gemini + Claude' in en"
理由: The translation is incomplete and does not cover the entire source text. | Hypothesis is not a proper translation - it's a placeholder text that doesn't translate the source content | The hypothesis is a meta-commentary describing a translation task rather than providing the actual translation. | No translation content is given; it's just a meta description of the task. |
| google/gemma-3-4b-it | Title: [Open Source] CCG v1.7.55: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude Content: GitHub: GitHub - fengshao1227/ccg-workflow: A multi-model collaboration development toolset - based on Claude Code CLI... |
轻微
[流畅性]
"Orchestration Three CLI Collaboration"
理由: Word order is awkward; more natural would be ‘orchestrates three-CLI collaboration’ or ‘3 CLI collaboration’. | Should be 'orchestrates 3-CLI' not 'Orchestration Three CLI' for better grammar and fluency. | Awkward phrasing. '3-CLI collaboration' or 'collaboration of three CLIs' is more natural. |
| google/translategemma-12b-it | Title: [Open Source] CCG v1.7.55: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude Content: GitHub: GitHub - fengshao1227/ccg-workflow: A multi-model collaborative development toolset based on Claude Code CLI... |
轻微
[流畅性]
"Orchestration Three CLI Collaboration"
理由: Awkward phrasing. '3-CLI collaboration' or 'collaboration of three CLIs' is more natural. | Should be 'orchestrates 3-CLI' not 'Orchestration Three CLI'. The reference uses more natural phrasing. | Ungrammatical; should be “orchestrates three-CLI collaboration” or similar, though meaning is inferable. |
| google/translategemma-4b-it | Title: [Open Source] CCG v1.7.55: Claude Code Orchestration Three CLI Collaboration | Codex + Gemini + Claude Content: GitHub: GitHub - fengshao1227/ccg-workflow: A multi-model collaboration development toolset based on Claude Code CLI... |
轻微
[流畅性]
"Orchestration Three CLI Collaboration"
理由: Awkward phrase; should be something like “orchestrates three CLIs” but meaning remains understandable. | Should be 'orchestrates 3-CLI' not 'Orchestration Three CLI'. The reference uses better phrasing. | Awkward phrasing; 'orchestrates 3-CLI collaboration' is smoother. |
| tencent/HY-MT1.5-1.8B | Translation of 'CCG v1.7.55 : Claude Code 编排三 CLI 协作 | Codex + Gemini + Claude' |
严重
[准确性]
"Translation of 'CCG v1.7.55 : Claude Code 编排三 CLI 协作 | Codex + Gemini + Claude'"
理由: Hypothesis is a placeholder string rather than an actual translation | No translation output; also should clarify that Claude Code orchestrates collaboration among three CLIs. | The hypothesis is a meta-description string, not the actual translation. It fails to translate the title and content. |
| tencent/HY-MT1.5-7B | "en": "Title: [Open Source] CCG v1.7.55: Claude Code Layout Three CLI Collaboration | Codex + Gemini + Claude", "zh-TW": "Title: [開源] CCG v1.7.55:Claude Code佈局三CLI協作 | Codex + Gemini + Claude", "ja": "Title: [オープンソース] CCG v1.7.55:Claude Codeレイアウト三CLI協力 | Codex + Gemini + Claude", "ko": "Title: [오픈소스] CCG v1.7.55: Claude Code 레이아웃 세CLI 협력 | Codex + Gemini + Claude", "de":="Titel: [Open Source] CCG v1.7.55: Claude Code-Bereichsplanung mit drei CLI-Tools | Codex + Gemini + Claude", "fr": "Titre: [Open Source] CCG v1.7.55: Organisation du code Claude avec trois outils CLI | Codex + Gemini + Claude", "it": "Titolo: [Open Source] CCG v1.7.55: Organizzazione del codice Claude con tre strumenti CLI | Codex + Gemini + Claude"} |
严重
[准确性]
"Claude Code Layout Three CLI Collaboration"
理由: "编排" here means orchestrate; "Layout Three" is incorrect. It should be "orchestrates 3-CLI collaboration". | Awkward and inaccurate translation of '编排三 CLI 协作'. 'Orchestrates 3-CLI collaboration' (as in reference) is more accurate than 'Layout'. | The hypothesis provides multiple language translations when only English was requested. The translation 'Layout Three CLI Collaboration' is awkward and inaccurate; the reference shows it should be 'orchestrates 3-CLI collaboration'. The content is completely missing. | Mistranslation: 'Layout Three' should be 'orchestrates 3' or 'coordinates 3' - '编排' means orchestrate/arrange, not layout. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: When I used GPT-5.4 on claude code, after updating constraint documents using ccg:team | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | ccg:team update constraint document |
严重
[准确性]
"ccg:team update constraint document"
理由: The hypothesis is too vague compared to the reference. | The hypothesis is a fragmented phrase that fails to convey the full meaning of the source (Title and Content). It misses the context of using GPT-5.4, the time taken (2.5h), and the exhaustion of limits. | Hypothesis is incomplete fragment that does not translate the full title; missing context about GPT-5.4 and time spent | Fragment that omits the main complaint about time taken, resource exhaustion, and task simplicity, present in source/reference. | Incomplete translation - only partial title translated, missing full context about GPT-5.4 in Claude Code |
| Qwen/Qwen2.5-14B-Instruct | Title: When I used gpt5.4 in claudecode, after using ccg:team to update constraint documents Content: He spent 2.5 hours on such a simple task of updating the document... exhausted the 5-hour limit of two team accounts and made 93 requests at the public station but still not enough |
轻微
[术语]
"public station"
理由: Literal translation of “公益站”; “free API proxy site”/“free shared API site” would be clearer. | 'Free API proxy site' is the consistent community term for '公益站'. | Should be 'free API proxy site' or 'free shared API site' not 'public station' | Should be 'free API proxy site' to match reference terminology |
| Qwen/Qwen2.5-7B-Instruct | Title: When I used gpt5.4 in claudecode and updated constraints documents using ccg:team, Content: RT updating the document was such a simple task, yet he spent 2.5 hours... exhausted my team's 5-hour limit by 2x + 93 requests from the public station still wasn't enough | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Dopo aver utilizzato gpt5.4 in claudecode e aggiornato il documento delle restrizioni con ccg:team |
严重
[准确性]
"Dopo aver utilizzato gpt5.4 in claudecode e aggiornato il documento delle restrizioni con ccg:team"
理由: The hypothesis omits the entire Content section, which contains critical details about the failure (time taken, limits drained). | Incomplete translation - only title provided, content missing. Also 'gpt5.4' should be 'GPT-5.4'. | Translation is in Italian instead of Chinese (target language zh) |
| Qwen/Qwen3.5-4B | Title: When I used gpt5.4 in claudecode and updated the constraint document using cccg:team, Content: RT - Such a simple task as updating documents took him 2.5 hours... It drained my two team accounts' 5-hour limit + 93 requests from the public site were still not met. |
严重
[准确性]
"cccg:team"
理由: Should be 'ccg:team' not 'cccg:team' | Typo in the tool name. Source says 'ccg:team', hypothesis says 'cccg:team'. | Typo; should be 'ccg:team' not 'cccg:team' | Mistranscribed tag as ‘cccg:team’ instead of ‘ccg:team’, introducing an incorrect command name. |
| Qwen/Qwen3.5-9B | Title: When I used gpt-5.4 in claudecode and updated constraint docs using ccg:team, Content: Updating such a simple doc took him 2.5h... drained my two team quotas (5h limit) + public station 93 requests without satisfying me. |
轻微
[术语]
"public station"
理由: '公益站' is better translated as 'free API proxy site' or 'community site' in this context; 'public station' sounds like a transport hub. | Should be 'free API proxy site' or '公益站' | Should be 'free API proxy site'. | “公益站”在该社区语境中为“free API proxy site / free shared API site”,直译为“public station”不自然且不清晰。 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | When I put gpt-5.4 into claudecode and update the constraint documents after using ccg:team updates... | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: When I use gpt5.4 in Claudecode, updating constraint documents after using ccg:team... |
严重
[准确性]
"..."
理由: Incomplete translation. Missing the full context. Should include 'When I use GPT-5.4 in Claude Code and update the constraint document using ccg:team' with proper punctuation. | Omission of the source content detailing the time taken (2.5h) and the exhaustion of limits (5h limit + 93 requests). | Body content about task duration and resource consumption is omitted entirely; only title translated | Missing content section which is present in the reference. |
| google/gemma-3-1b-it | The simple task of updating the 'rt' document using ccg:team, after the 2.5h update, resulted in me losing two team hours and a 93 request without satisfaction. |
严重
[准确性]
"the 'rt' document"
理由: Incorrect translation - 'losing' should be 'draining/exhausting', '93 request' should be '93 requests from the free API proxy site', 'without satisfaction' should be "and it's still not satisfied" - missing the context of the free API proxy site (公益站) | Mistranslation of 'rt'. In Chinese social media context, 'rt' stands for '如题' (as the title says / see title), not a document name. | The translation is slightly awkward and could be more natural. | Inaccurate translation. 'rt' is not a document name but means 'as the title says'. The sentence structure and meaning are significantly distorted. | 'rt' in source means 'as the title says', not the name of a document. |
| google/gemma-3-4b-it | Title: When I use gpt5.4 in claudecode and update constraint documents using ccg:team, Content: It took him 2.5h to complete such a simple task as updating the documentation… exhausting my two team accounts’ 5h limit + 93 requests on the charity station is not enough. |
轻微
[术语]
"charity station"
理由: Should be 'free API proxy site' not 'charity station' to match reference terminology. | Unnatural for ‘公益站’; ‘free API proxy site’ or ‘public welfare site’ would be clearer. | '公益站' is better translated as 'free API proxy site' or 'community site' in this tech context. 'Charity station' sounds like a physical donation booth. |
| google/translategemma-12b-it | Title: When I use gpt5.4 in Claudecode, after updating the constraint document using ccg:team... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: When I use gpt5.4 in claudecode and update the constraint documents with ccg:team, the result is... | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '当我把gpt5.4用在claudecode里,使用ccg:team更新约束文档后' in en |
严重
[准确性]
"Translation of '当我把gpt5.4用在claudecode里,使用ccg:team更新约束文档后' in en"
理由: Hypothesis is a placeholder string rather than an actual translation | Missing actual English sentence; should express using GPT-5.4 in Claude Code and updating the constraint document with ccg:team. | The hypothesis is a meta-description string, not the actual translation. It fails to translate the source text. |
| tencent/HY-MT1.5-7B | When I used GPT-5.4 in ClaudeCode and updated the constraint document using ccg:team |
严重
[准确性]
"When I used GPT-5.4 in ClaudeCode and updated the constraint document using ccg:team"
理由: Content section was completely omitted from translation. | Incomplete translation: The hypothesis only translates the title and omits the entire 'Content' section detailing the time taken and resource usage. | The hypothesis only provides the title translation and completely omits the content. The reference includes both title and content translations. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Breaking News! Tencent officially entered the scene, QQ can now access Lóngxiāo, WeChat pill |
严重
[准确性]
"WeChat pill"
理由: Should be 'WeChat is doomed' not 'WeChat pill'; '药丸' is slang meaning something is finished/doomed | Should be 'WeChat is doomed/done for' not 'WeChat pill' - wrong idiom translation | "药丸" is slang meaning "doomed"; literal 'pill' is incorrect and confusing. | Literal mistranslation of slang '药丸' (doomed/finished) as 'pill'. |
| CohereLabs/tiny-aya-water | Title: Breaking News! Tencent officially entered the game, QQ can now access Longshou, WeChat pill |
严重
[术语]
"Longshou"
理由: Incorrect transliteration. '龙虾' in this context is the project 'OpenClaw'. 'Longshou' is a literal Pinyin that loses the specific community reference. | Incorrect translation - '龙虾' refers to OpenClaw, not 'Longshou' (literal translation) | Incorrect rendering; should refer to "龙虾"/OpenClaw, not "Longshou". |
| Qwen/Qwen2.5-14B-Instruct | Title: [Latest News] Tencent Officially Supports QQ Integration with Lobster, WeChat is Doomed Content: Originally struggling to figure out how to get my wife to use it, I saw in the group that QQ bots can directly integrate with lobster. Now I'm already using it; it's really simple, just three commands. |
严重
[准确性]
"lobster"
理由: Fails to identify '龙虾' as the slang for 'OpenClaw'. | Uses generic “lobster” instead of the established product name “OpenClaw”, losing the specific reference to the tool. | Should be 'OpenClaw' not 'Lobster'. '龙虾' is slang for OpenClaw in this context. | The translation incorrectly uses 'Lobster' instead of 'OpenClaw' as in the reference. |
| Qwen/Qwen2.5-7B-Instruct | Title: [Breaking News] Tencent Officially Enters QQ Can Now Integrate Lobster, WeChat Is Doomed Content: Originally struggled to find a way to let my wife play with it, but saw that the QQ robot can directly integrate Lobster in the group chat, and we're already playing now, really very simple just three commands. |
严重
[准确性]
"Lobster"
理由: Failed to translate the slang '龙虾' as 'OpenClaw', leading to confusion about what is being integrated. | Community term “龙虾” refers to OpenClaw; translating literally as “Lobster” loses the product name. | Incorrect translation of '龙虾' which should be 'OpenClaw' in this context. |
| Qwen/Qwen3-14B | 【Notizia】Tencent entra ufficialmente nel mercato, QQ può ora accedere a Longha, WeChat è escluso |
严重
[准确性]
"Longha"
理由: Translation is in Italian instead of Chinese (target language zh) | The hypothesis omits the Content section. It mistranslates '龙虾' as 'Longha' (pinyin) instead of 'OpenClaw', and '药丸' (slang for doomed/finished) as 'escluso' (excluded), losing the slang meaning. | "龙虾" refers to OpenClaw; "Longha" is a malformed transliteration, losing the OpenClaw name and meaning. |
| Qwen/Qwen3.5-4B | Title: [Breaking News] Tencent Officially Joins In: QQ Can Now Connect to Lobster, WeChat is Broken Content: Originally struggling with how to get my wife to play, I saw in the group that QQ bots can directly connect to Lobster. It's now working, and it's really simple—just three commands. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [Breaking News] Tencent Officially Steps In: QQ Can Now Access LinuxDo Community Context for Lobster; WeChat Is Finished Content: I was struggling to figure out how to give my wife something fun to play with until I saw in the group that QQ bots could directly integrate with LinuxDo community context for lobster. It's now working and is really simple—just three commands. |
严重
[准确性]
"LinuxDo community context for Lobster"
理由: 原文与参考均为“接入/集成 OpenClaw”,与“LinuxDo community context”无关;添加不存在的“community context”概念并遗漏 OpenClaw 名称。 | Hypothesis adds 'LinuxDo Community Context for' which is not in source; reference simply uses 'OpenClaw' - hypothesis over-explains | Mistranslation: '龙虾' refers to OpenClaw, not LinuxDo context | Severe hallucination/mistranslation. '接入龙虾' means 'connect to OpenClaw'. The hypothesis invents 'LinuxDo community context' and keeps 'Lobster' literal. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Breaking News: QQ can now connect to LinuxDo through Tencent's official interface. WeChat pills. |
致命
[准确性]
"connect to LinuxDo"
理由: Severe mistranslation: '龙虾' refers to OpenClaw, not 'LinuxDo'. Failed to recognize context | Mistranslates '龙虾' (OpenClaw) as 'LinuxDo'. LinuxDo is the forum name, OpenClaw is the tool. | '龙虾' is slang for 'OpenClaw', not 'LinuxDo'. Should be 'QQ can now integrate with OpenClaw' | “接入龙虾” refers to integrating OpenClaw, not LinuxDo
轻微
[准确性]
"WeChat pills"
理由: Mistranslation: '微信药丸' means 'WeChat is doomed/dying', not 'WeChat pills' | Literal rendering of “药丸”; idiom means “is doomed” or “is done for”, not “pills” | Literal translation of '药丸' (homophone for '完蛋' - doomed/finished). Should be 'WeChat is doomed'. | '微信药丸' is slang meaning 'WeChat is doomed', not 'WeChat pills' |
| google/gemma-3-12b-it | Breaking News: Tencent Officially Steps In - QQ Can Now Integrate Lobster! WeChat is Feeling Bad. |
严重
[准确性]
"WeChat is Feeling Bad"
理由: Weak translation of '药丸' (slang for 'doomed' or 'finished', pun on 'yao wan'). 'Feeling bad' understates the meaning. | '微信药丸' is internet slang meaning 'WeChat is doomed', not 'WeChat is Feeling Bad'. | "药丸" is slang for "doomed"/"finished"; "feeling bad" understates and alters the meaning
严重
[准确性]
"Lobster"
理由: Should refer to "OpenClaw" integration; generic "Lobster" loses specific product name | Failure to translate the slang '龙虾' as 'OpenClaw', leading to confusion. | '龙虾' is slang for 'OpenClaw', not literal lobster. Should be 'OpenClaw'. |
| google/gemma-3-1b-it | 【Quick News】 Tencent officially announced that QQ can be integrated into the dragon shrimp, and WeChat pill is now available. |
严重
[准确性]
"WeChat pill is now available"
理由: 'dragon shrimp' should be 'OpenClaw' (proper noun). 'WeChat pill' is incorrect; '微信药丸' is slang meaning 'WeChat is doomed'. Missing the content about QQ bots and setup instructions. | Major mistranslation - '龙虾' refers to OpenClaw (AI tool), not 'dragon shrimp'. '微信药丸' means 'WeChat is doomed/finished', not 'WeChat pill is available' | Complete mistranslation of the slang '微信药丸' (WeChat is doomed/finished). The model translated '药丸' (yào wán) literally as 'pill' instead of recognizing it as a homophone for '要完' (yào wán - to be finished). | '微信药丸' is slang meaning 'WeChat is doomed', not that a 'WeChat pill' is available. |
| google/gemma-3-4b-it | Breaking News: Tencent’s downfall, and now QQ can access ‘Long Xia’! WeChat pill. Originally struggling to play for wife, saw in the group that QQ robots can directly access ‘Long Xia’, now I can play it, really simple, just three commands. |
严重
[准确性]
"Tencent’s downfall"
理由: Source '腾讯官方下场' means 'Tencent officially steps in/enters the fray'. Hypothesis mistranslates '下场' (enter the field) as 'downfall' (ruin). | Mistranslation: '腾讯官方下场' means 'Tencent officially steps in' (gets involved), not 'Tencent's downfall' | Completely wrong; source means ‘Tencent officially steps in’, not ‘downfall’. | Should be '[Breaking] Tencent officially steps in' not 'Tencent's downfall'.
严重
[准确性]
"WeChat pill"
理由: Mistranslation. Should be 'Tencent officially steps in, QQ can now integrate with OpenClaw, WeChat is doomed'. '龙虾' is OpenClaw, not 'Long Xia'. | Literal translation of '微信药丸' (WeChat is doomed/finished, slang 'yao wan'). 'WeChat pill' is nonsensical in English. | Literal translation of ‘药丸’ as ‘pill’ is incorrect; should convey ‘WeChat is doomed’ or similar. | Mistranslation: '微信药丸' means 'WeChat is doomed/finished' not 'WeChat pill' |
| google/translategemma-12b-it | Breaking News: Tencent Officially Supports Integration with Lobster, WeChat is Feeling the Pressure. |
严重
[准确性]
"WeChat is Feeling the Pressure"
理由: Should be 'WeChat is doomed' to match the source meaning of '药丸' (meaning 'is done for/finished') | Should be 'WeChat is doomed' not 'WeChat is Feeling the Pressure'. '微信药丸' is internet slang meaning WeChat is doomed/finished. | The translation softens the reference's 'WeChat is doomed'. | Original “微信药丸” expresses “WeChat is doomed”; “feeling the pressure” significantly softens the meaning. |
| google/translategemma-4b-it | Breaking News: Tencent officially shuts down QQ, now you can access Lobster! WeChat pill |
致命
[准确性]
"officially shuts down QQ"
理由: Severe mistranslation: '下场' means 'to step in' or 'get involved', not 'shut down'. This reverses the meaning. | 严重误译。“下场 QQ 可以接入龙虾了” means Tencent officially steps in and QQ can integrate with OpenClaw, not shutting QQ down. | Incorrect: '下场' means 'steps in/participates', not 'shuts down' | Multiple errors: Should be '[Breaking] Tencent officially steps in' not 'shuts down'; 'Lobster' should be 'OpenClaw'; 'WeChat pill' should be 'WeChat is doomed'.
严重
[准确性]
"WeChat pill"
理由: Incorrect translation: '药丸' means 'doomed/finished' not 'pill' | Literal translation of slang '药丸' (doomed/finished), resulting in nonsense. | Literal rendering of “药丸” loses idiomatic meaning “is doomed / in trouble”. |
| tencent/HY-MT1.5-1.8B | Translation of '快讯' in en |
严重
[准确性]
"Translation of '快讯' in en"
理由: The hypothesis is a meta-description string, not the actual translation. It fails to translate the title and content, missing the specific slang and context. | Hypothesis is a placeholder string rather than an actual translation. Only translates a single word '快讯' without context from the full title | Does not give an English word; should translate as something like 'Breaking' or 'News flash' in this context. |
| tencent/HY-MT1.5-7B | "en": "Title: [Breaking News] Tencent Officially Supports QQ Connecting to ‘LinuxDo’; WeChat Also Available", "zh-TW": "Title: [Breaking News] Tencent Officially Supports QQ Connecting to ‘LinuxDo’; WeChat Also Available", "ja": "Title: [Breaking News] Tencent Officially Supports QQ Connecting to ‘LinuxDo’; WeChat Also Available", "ko": "Title: [Breaking News] Tencent Officially Supports QQ Connecting to ‘LinuxDo’; WeChat Also Available", "de":="Titel: [Breaking News] Tencent unterstützt offiziell die Verbindung von QQ mit ‘LinuxDo’; WeChat ist ebenfalls verfügbar", "fr": "Titre: [Breaking News] Tencent soutient officiellement la connexion de QQ à ‘LinuxDo’; WeChat est également disponible", "it": "Titolo: [Breaking News] Tencent supporta ufficialmente la connessione di QQ a ‘LinuxDo’; WeChat è anch’esso disponibile"} |
致命
[准确性]
"connecting to ‘LinuxDo’; WeChat Also Available"
理由: Mistranslation of '龙虾' (OpenClaw) as 'LinuxDo'. LinuxDo is the forum name, not the tool being integrated. | Source says QQ can connect to OpenClaw and implies "WeChat is doomed"; here it claims connection to LinuxDo and that WeChat is available, reversing and fabricating key facts. | The hypothesis provides multiple language translations when only English was requested. More critically, it mistranslates '龙虾' as 'LinuxDo' when the reference shows it should be 'OpenClaw'. The translation also misses the informal tone and the actual meaning ('WeChat is doomed'). The content is completely missing. | Wrong entity: 'LinuxDo' should be 'OpenClaw' - completely wrong translation of the entity name. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: [v0.1] I broke down research workflows into a bunch of skills, making codex/cc like a worker |
轻微
[风格]
"like a worker"
理由: Misses the slang nuance of '打工人' (corporate slave/grind) captured in the reference. | Misses colloquial nuance "打工人一样拉磨" ('grind like corporate slaves'); tone is softened. | Should be 'like corporate slaves' or 'like a grind worker' for more accurate tone |
| CohereLabs/tiny-aya-water | Title: [v0.1] I broke down research workflows into a bunch of skills, making codex/cc like a worker | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: [v0.1] I broke down research processes into a bunch of Skills, making codex/cc work like regular employees grinding away Content: Research pipelines as semantic execution units: each skill declares inputs/outputs, acc | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [v0.1] I broke down my research process into skills, making codex/cc work like a worker. Content: Research pipelines as semantic execution units: each skill declares inputs/outputs, acc | 通过 (无共识错误) |
| Qwen/Qwen3-14B | [v0.1] Ho suddiviso il mio flusso di lavoro di ricerca in una serie di competenze, facendo in modo che codex/cc lavori come un dipendente diligente che tira la ruota della mola |
严重
[准确性]
"lavora come un dipendente diligente che tira la ruota della mola"
理由: The hypothesis omits the Content section. The translation of '拉磨' as 'tira la ruota della mola' is overly literal and misses the idiomatic sense of 'grinding' like a 'corporate slave' (打工人) mentioned in the reference context. | Source metaphor "打工人一样拉磨" is closer to "corporate slave"/overworked employee; "dipendente diligente" softens the negative connotation. | Translation is in Italian instead of Chinese (target language zh) |
| Qwen/Qwen3.5-4B | Title: [v0.1] I broke down my research process into a bunch of Skills, so codex/cc works like an employee pulling a cart Content: Research pipelines as semantic execution units: each skill declares inputs/outputs, acc | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: [v0.1] I decomposed research workflows into a bunch of Skills, making codex/cc work like laborers pulling a millstone Content: Research pipelines as semantic execution units: each skill defines inputs/outputs, acc |
轻微
[风格]
"laborers pulling a millstone"
理由: Should be 'corporate slaves' or 'grind like打工仔' to match original metaphor | 原文“打工人”在现代语境中含压榨、社畜意味,参考用“corporate slaves”;“laborers pulling a millstone”偏字面而不够贴近当代网络语气。 | Over-literal translation of '拉磨' (working like a donkey/corporate slave). While descriptive, it loses the specific cultural nuance of 'corporate slave' (打工人) captured in the reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | I broke down the research process into a bunch of skills, making codex/cc work like a laborer grinding away. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [v0.1] I've broken down my research workflow into a bunch of Skills, allowing codex/cc to work like an employee pulling ropes. |
严重
[准确性]
"work like an employee pulling ropes"
理由: Source metaphor "打工人一样拉磨" means grinding like corporate drones; "employee pulling ropes" is incorrect and unnatural, losing the "grind" connotation | '拉磨' is an idiom meaning 'grinding like corporate slaves' or 'working hard repetitively', not 'pulling ropes'. Should be 'grind like corporate slaves'. | Should be 'grind like corporate slaves' (拉磨 means to grind/work like a donkey) |
| google/gemma-3-1b-it | Research pipelines as semantic execution units: each skill declares inputs/outputs, acc | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: [v0.1] I’ve broken down research workflows into a set of skills, letting codex/cc work like office workers grinding. |
轻微
[风格]
"office workers grinding"
理由: '打工人' is specific slang for 'corporate slaves' or 'wage earners'. 'Office workers' loses the self-deprecating nuance. | Inaccurate translation: Reference uses 'corporate slaves' (a specific term), hypothesis uses generic 'office workers' | Softens slang ‘打工人’ which is closer to ‘corporate slaves’; nuance partially lost. |
| google/translategemma-12b-it | Title: [v0.1] I've broken down the research process into a bunch of Skills, allowing codex/cc to work like regular employees. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [v0.1] I have broken down research workflows into a set of skills, so that codex/cc can work like a diligent worker. |
轻微
[风格]
"diligent worker"
理由: Translation is acceptable but 'corporate slaves' in reference captures the slang tone better | Softens the humorous/slang nuance of “打工人 / corporate slaves”; less edgy but gist is preserved. | Sanitization of slang '打工人' (corporate slave/worker bee); 'diligent worker' loses the self-deprecating humor. |
| tencent/HY-MT1.5-1.8B | Title: [v0.1] 我把科研流程拆成了一堆 Skills,让codex/cc像打工人一样拉磨 |
严重
[准确性]
"[v0.1] 我把科研流程拆成了一堆 Skills,让codex/cc像打工人一样拉磨"
理由: Hypothesis simply copies the Chinese source instead of translating; misses informal metaphor about making Codex/Claude Code grind like office workers. | The hypothesis simply repeats the source Chinese text instead of translating it into English. | The translation is overly verbose and does not match the concise nature of the reference. |
| tencent/HY-MT1.5-7B | Title: [v0.1] Breaking down research processes into skills to make codex/cc function like a worker |
轻微
[风格]
"to make codex/cc function like a worker"
理由: Missing '[v0.1]' version prefix from the original title. | Reference uses strong colloquial metaphor "grind like corporate slaves"; "function like a worker" significantly softens the tone. | Loss of nuance. '拉磨' (pulling a mill) implies mindless, repetitive labor ('grind like corporate slaves'), which 'function like a worker' fails to capture fully. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Slightly modified to fix the 400 issue related to converting cpa to cc | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Slightly modified to fix the 400 issue related to converting cpa to cc | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Made some adjustments to fix the 400 issue converting cpa to cc Content: It seems that the new version of claude code has updated validation rules, causing an unknown parameter error for the tools.defer_loading parameter: | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Slightly modified to fix 400 issues for cpa to cc Content: The new version of Claude code seems to have updated validation rules, and tools.defer_loading parameter causes unknown parameter errors: | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Corretto il problema del codice 400 durante la conversione da cpa a cc |
轻微
[准确性]
"durante la conversione da cpa a cc"
理由: The hypothesis omits the Content section which explains the technical cause (validation rules, unknown parameter). | "转" here is more like routing via CPA to Claude Code; "conversione" is slightly misleading but overall understandable. | Incomplete translation - only title provided, content missing. Also 'routing' is more accurate than 'conversione' (conversion). | Translation is in Italian instead of Chinese (target language zh) |
| Qwen/Qwen3.5-4B | Title: Fixed an issue with cpas being converted to cc (LinuxDo community context). Content: It seems that the new version of Claude Code has updated validation rules; using the tools.defer_loading parameter may cause an unknown parameter error: |
严重
[准确性]
"cpas being converted to cc (LinuxDo community context)"
理由: The hypothesis adds '(LinuxDo community context)' which is not in the source, and misinterprets 'cpa 转 cc 的 400 问题' (the 400 error when routing CPA to CC) as 'cpas being converted to cc'. | Should be 'routing CPA to Claude Code' not 'cpas being converted to cc' | Mistranslation; should be 'CPA to Claude Code' not 'cpas being converted to cc'. Also missing the '400 error' context. |
| Qwen/Qwen3.5-9B | Title: Fixed an issue where CPA was converted to CC (LinuxDo community). Content: The new version of Claude Code seems to have updated validation rules; the tools.defer_loading parameter causes unknown parameter errors: |
严重
[准确性]
"CPA was converted to CC (LinuxDo community)."
理由: Complete mistranslation of '把 cpa 转 cc 的 400 问题'. It means 'the 400 error issue when routing/converting CPA to CC'. The hypothesis invents 'LinuxDo community' and misinterprets the grammar. | Mistranslation: 'cpa转cc' means routing CPA to Claude Code, not conversion to LinuxDo community | 原意是“把cpa转cc的400错误问题修了”,即修复 400 错误;当前译文提到“LinuxDo community”且未说明 400 错误,偏离原文。 | Hypothesis adds '(LinuxDo community)' which is not in source; reference simply states 'when routing CPA to Claude Code' - hypothesis adds incorrect context |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Slightly modified to fix the 400 error when converting cpa to cc | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Slightly modified to fix the 400 issue by converting CPA to cc Content: The new version of Claude Code seems to have updated validation rules; the tools.defer_loading parameter causes an unknown parameter error: |
轻微
[准确性]
"converting CPA to cc"
理由: Source and reference indicate routing CPA requests to Claude Code; "converting CPA to cc" is unclear and less accurate than "routing to Claude Code" | Should be 'routing CPA to Claude Code' for better accuracy. 'Converting' doesn't capture the technical meaning of the operation. | Ambiguous phrasing. 'Routing CPA to Claude Code' or 'forwarding' is more precise than 'converting' in a proxy context. |
| google/gemma-3-1b-it | Translation of '稍微改了下把cpa转cc的400问题修了' in en |
严重
[准确性]
"Translation of '稍微改了下把cpa转cc的400问题修了' in en"
理由: Hypothesis is not a translation but a meta-description. It should provide the actual English translation, not describe what should be translated. | No translation is provided; only a task description. | The hypothesis is a meta-commentary describing a translation task rather than providing the actual translation. | Hypothesis is not a proper translation - it's a placeholder text that doesn't translate the source content | The translation is incomplete and does not cover the entire source text. |
| google/gemma-3-4b-it | Title: Made some minor changes to fix the 400 issue for converting cpa to cc. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Slightly modified to fix the issue where CPA was incorrectly converted to cc. |
严重
[准确性]
"where CPA was incorrectly converted to cc"
理由: Should be 'fixed the 400 error issue when routing CPA to Claude Code' not 'fix the issue where CPA was incorrectly converted to cc'. The source refers to a specific 400 error, not conversion. | Original refers to fixing a 400 HTTP error when routing CPA to Claude Code, not an incorrect conversion; misrepresents error type and target. | Should be 'routing CPA to Claude Code' - the issue is about routing/redirecting, not conversion | Misinterpretation of 'cpa 转 cc'. In this context, it likely refers to routing or proxying requests from CPA to Claude Code (cc), not a file conversion. The '400 problem' refers to an HTTP error, which is missed. |
| google/translategemma-4b-it | Title: Fixed a minor issue where CPA was incorrectly converted to cc. |
严重
[准确性]
"CPA was incorrectly converted to cc"
理由: Complete misinterpretation of the source. The source says 'fixed the 400 error when converting CPA to CC', not that the conversion logic itself was incorrect. Also omits content. | Incorrect meaning. Should be 'Made a slight modification and fixed the 400 error issue when routing CPA to Claude Code' - the original is about fixing a routing error, not a conversion issue. | Incorrect translation: source says '400问题' (400 error issue), not 'minor issue' | Meaning is about fixing a 400 error when routing CPA to Claude Code, not about incorrect format conversion to cc. |
| tencent/HY-MT1.5-1.8B | Translation of '稍微改了下把cpa转cc的400问题修了' in en |
严重
[准确性]
"Translation of '稍微改了下把cpa转cc的400问题修了' in en"
理由: Hypothesis is a placeholder string rather than an actual translation | Only meta-text; should state that a small change fixed the 400 error when converting/routing CPA to CC (Claude Code). | The hypothesis is a meta-description string, not the actual translation. It fails to translate the source text. |
| tencent/HY-MT1.5-7B | Title: Made some minor changes to fix the 400 error when converting CPA to CC |
轻微
[准确性]
"Made some minor changes to fix the 400 error when converting CPA to CC"
理由: Mistranslation: 'routing CPA to Claude Code' would be more accurate than 'converting CPA to CC' - CPA is a proxy tool, not a format to convert. | Slight imprecision. '转' in this context likely refers to routing or proxying traffic ('routing CPA to Claude Code') rather than converting the tools themselves. | The translation 'converting CPA to CC' is imprecise. The reference shows it should be 'routing CPA to Claude Code' which better captures the technical meaning. The content is completely missing. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: A tool for batch account management using CLI Proxy API: CliproxyAccountCleaner | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: A tool for batch account management of CLI Proxy API: CliproxyAccountCleaner | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: A CLI Proxy API bulk account management tool: CliproxyAccountCleaner Content: Recently received many free codex accounts from seniors, but these accounts easily expire or run out quickly when shared among many people. Tried some scripts written by friends but was not satisfied, so I made one myself! | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: A CLI Proxy API Batch Account Manager Tool: CliproxyAccountCleaner Content: Recently received many free codex accounts from big players, but these accounts tend to expire easily or get exhausted quickly when shared by multiple people. Some scripts written by friends didn't meet my expectations, so I made one myself! | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Uno strumento per la gestione di massa degli account dell'API Proxy CLI: CliproxyAccountCleaner |
严重
[准确性]
"Uno strumento per la gestione di massa degli account dell'API Proxy CLI: CliproxyAccountCleaner"
理由: Incomplete translation - only title provided, content missing. | The hypothesis omits the entire Content section describing the motivation and functionality. | Translation is in Italian instead of Chinese (target language zh) |
| Qwen/Qwen3.5-4B | Title: A tool for batch account management via CLI Proxy API: CliproxyAccountCleaner Content: Recently received many free codex accounts shared by experts, but these accounts expire easily or run out of quota quickly when shared among multiple users. Tried some scripts written by friends but were not satisfied, so I made one myself! | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: A tool for CLI Proxy API batch account management: CliproxyAccountCleaner Content: Recently received many free codex accounts from experts, but these accounts expire easily or run out of quota quickly when shared among multiple users. I'm not satisfied with some scripts written by friends, so I made one myself! | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | A tool for CLI proxy API bulk account management: CliproxyAccountCleaner | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: A tool for batch management of proxy accounts via CLI Proxy API: CliproxyAccountCleaner Content: I've recently received many free codex accounts from generous users, but these accounts often expire quickly or lose their quota when shared by multiple people. I wasn't satisfied with some scripts written by friends, so I made my own! | 通过 (无共识错误) |
| google/gemma-3-1b-it | A tool for managing bulk account cleaning for CLI proxies: CliproxyAccountCleaner |
轻微
[术语]
"for CLI proxies"
理由: 'managing bulk account cleaning' is awkward. Should be 'batch account management' as in the reference. The phrasing is less natural. | Source: 'CLI Proxy API'; 'CLI proxies' is close but omits 'API' nuance, though still understandable. | 'account cleaning' should be 'account management' - the tool manages accounts, not just cleans them |
| google/gemma-3-4b-it | Title: A tool for batch account management of CLI Proxy API: CliproxyAccountCleaner Content: Recently received many free Codex accounts from seniors, but these accounts easily expire or are quickly shared by multiple people, leading to depletion of their quotas. I’ve tried some scripts written by friends and haven’t been very satisfied, so I made my own! | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: A tool for batch account management of CLI Proxy APIs: CliproxyAccountCleaner Content: I recently received many free Codex accounts from experts, but these accounts often expire quickly or have their quotas exhausted when shared by multiple users. I wasn't satisfied with some scripts written by friends, so I created my own! | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: A tool for CLI Proxy API bulk account management: CliproxyAccountCleaner Content: Recently received many free codex accounts from seniors, but these accounts are easily invalid or quickly shared by multiple people, resulting in no quota. I have tried scripts written by friends, but they were not satisfactory, so I made my own! | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '一个用于CLI Proxy API 批量账号管理的工具:CliproxyAccountCleaner' in en |
严重
[准确性]
"Translation of '一个用于CLI Proxy API 批量账号管理的工具:CliproxyAccountCleaner' in en"
理由: Hypothesis is a placeholder string rather than an actual translation | No explicit translation; should render as 'A tool for batch account management for CLI Proxy API: CliproxyAccountCleaner' or similar. | The hypothesis is a meta-description string, not the actual translation. It fails to translate the source text. |
| tencent/HY-MT1.5-7B | "en": "Title: A Tool for Batch Account Management via CLI Proxy API: CliproxyAccountCleaner", "zh-TW": "标题:一款用于CLI Proxy API批量账户管理的工具:CliproxyAccountCleaner", "ja": "标题:CLI Proxy APIを利用した一括アカウント管理ツール:CliproxyAccountCleaner", "ko": "标题:CLI Proxy API를 활용한 일괄 계정 관리 도구:CliproxyAccountCleaner", "de":="Titel: Ein Tool zur batchbasierten Verwaltung von Konten über die CLI Proxy API: CliproxyAccountCleaner", "fr":="Titre : Un outil pour la gestion par lots des comptes via l’API CLI Proxy : CliproxyAccountCleaner", "it":="Titolo: Uno strumento per la gestione batch degli account tramite l’API CLI Proxy : CliproxyAccountCleaner"} |
严重
[准确性]
"Title: A Tool for Batch Account Management via CLI Proxy API: CliproxyAccountCleaner"
理由: Minor preposition difference ("for" vs "via") but overall meaning preserved; no substantial error. | The hypothesis provides multiple language translations when only English was requested. The content is completely missing. | The hypothesis contains multiple language translations mixed together instead of providing a single target translation. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: How to configure the model when using CLI Proxy for claude cc-switch? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: How to configure models on claude cc-switch using CLI Proxy? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: How to configure the model in claude cc-switch using CLI Proxy? Content: How to configure the model in claude cc-switch using CLI Proxy? Directly calling from the official website works fine, but I get an error when using CLI Proxy. I have already subscribed to pro. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: How to configure models in claude cc-switch when using CLI Proxy relay? Content: How to configure models in claude cc-switch when using CLI Proxy relay? Direct API call works fine, but there's an error via CLI Proxy. Already subscribed to pro. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | How to configure models in claude cc-switch using CLI Proxy? |
严重
[准确性]
"How to configure models in claude cc-switch using CLI Proxy?"
理由: Translation is in English instead of Chinese (target language zh) | The hypothesis omits the Content section which provides specific error details and context about the Pro subscription. | Incomplete translation - only title provided, content missing. Also should be 'CC-Switch' with capitals. |
| Qwen/Qwen3.5-4B | Title: How to configure models in claude cc-switch when using CLI Proxy as a proxy? Content: It works fine calling directly from the official website, but an error occurs when using CLI Proxy. I have already purchased a Pro subscription. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: How to configure models in Claude CC-Switch via CLI Proxy? Content: I have no problem calling directly from the official website, but when using CLI Proxy, an error occurs. I've already upgraded to Pro subscription. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | How to configure the model in the claude cc-switch using CLI Proxy? | 通过 (无共识错误) |
| google/gemma-3-12b-it | How to configure the model in the claude cc-switch when using CLI Proxy tunneling? |
严重
[准确性]
"How to configure the model in the claude cc-switch when using CLI Proxy tunneling?"
理由: Omission of the source content describing the error behavior and the Pro subscription status. | Should be 'How to configure the model in CC-Switch for Claude proxied through CLI Proxy?' for better accuracy and consistency with reference. | Missing content section which is present in the reference. |
| google/gemma-3-1b-it | How to configure the Claude cc-switch model using CLI Proxy? | 通过 (无共识错误) |
| google/gemma-3-4b-it | How to configure the model in claude cc-switch using CLI Proxy? | 通过 (无共识错误) |
| google/translategemma-12b-it | How to configure the model in the claude cc-switch when using CLI Proxy for forwarding? | 通过 (无共识错误) |
| google/translategemma-4b-it | How to configure the model in claude cc-switch using CLI Proxy? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | How to configure the model in Claude cc-switch using CLI Proxy? | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "How to configure models in Claude CC-Switch using the CLI Proxy? Direct calls to the official website work without issues, but errors occur when using the CLI Proxy despite having purchased a Pro subscription.", "zh-TW": "如何使用CLI Proxy在Claude CC-Switch中配置模型?直接访问官方网站没有问题,但使用CLI Proxy时会遇到错误,尽管已经购买了Pro订阅服务。", "ja": "CLAude CC-SwitchでCLI Proxyを使用してモデルをどのように設定するか?公式ウェブサイトからの直接呼び出しでは問題なく動作するが、CLI Proxyを使用するとエラーが発生する。Proサブスクリプションを購入しているにもかかわらずだ。」, "ko": "CLAude CC-Switch에서 CLI Proxy를 사용하여 모델을 어떻게 설정하는지? 공식 웹사이트를 통한 직접 호출은 문제없이 이루어지지만, CLI Proxy를 사용하면 오류가 발생한다. Pro 구독 서비스를 구매했음에도 불구하고 말이다.", "de": "Wie konfiguriert man Modelle in Claude CC-Switch mithilfe des CLI Proxies? Direkte Anrufe über die offizielle Website funktionieren ohne Probleme, aber bei Verwendung des CLI Proxies treten Fehler auf – obwohl eine Pro-Abonnement gebucht wurde.", "fr": "Comment configurer les modèles dans Claude CC-Switch à l’aide du proxy CLI ? Les appels directs depuis le site officiel se déroulent sans problème, mais des erreurs surviennent lors de l’utilisation du proxy CLI, même si une abonnement Pro a été acheté.", "it": "Come configurare i modelli in Claude CC-Switch utilizzando il proxy CLI? Gli appel diretti dal sito ufficiale funzionano senza problemi, ma si verificano errori quando si utilizza il proxy CLI, nonostante sia stato acquistato un abbonamento Pro." | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: CPA Sets Thinking Level, For Those Afraid of Being Shadowbanned |
严重
[准确性]
"For Those Afraid of Being Shadowbanned"
理由: '降智' literally means 'intelligence reduction' (downgraded performance). 'Shadowbanned' is a related but distinct concept, though acceptable in context, 'downgraded' is more precise. | Changes the meaning from '怕降智的看这里' | "降智" here means performance/intelligence downgrade, not shadowban; mistranslation of key concept. | Should be 'Setting the reasoning effort in CPA. Look here if you are afraid of being downgraded/shadowbanned' - missing context about 'reasoning effort' |
| CohereLabs/tiny-aya-water | Title: CPA Sets Thinking Time, For Those Afraid of Being Shadowbanned | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Impostazioni di pensiero CPA per chi teme il depotenziamento |
致命
[准确性]
"Impostazioni di pensiero CPA per chi teme il depotenziamento"
理由: The hypothesis is in Italian, but the target language is Chinese (zh). Complete language mismatch. | Hypothesis is in Italian, not English. Should be in English. Correct translation should be 'Setting the reasoning effort in CPA. Look here if you are afraid of being downgraded/shadowbanned' | Italian translation where target language/context is Chinese/English; omits mention of ‘setting reasoning effort’ and ‘shadowban/downgrade’ and changes language entirely. | Entire title is in Italian instead of English - completely wrong language | The translation is in Italian instead of Chinese, which is a critical error. |
| Qwen/Qwen2.5-7B-Instruct | Title: CPA Set Thinking Capacity, Fear Degradation? Look Here Content: There's been discussion about CPA reversing Codex degradation issues, but CPA documentation explains this clearly. |
严重
[准确性]
"CPA reversing Codex degradation issues"
理由: Misinterpretation of 'CPA 反代 Codex 降智'. It means 'CPA reverse-proxying Codex causes intelligence degradation', not 'reversing degradation issues'. | Source: “CPA反代Codex降智的问题” (reverse proxy causing downgrade/shadowban). Current rendering misses the “reverse proxy” aspect and “降智” nuance. | Mistranslation: 'CPA reverse proxying Codex causing a downgrade/shadowban' - the reference clarifies this is about CPA reverse proxy causing issues, not just 'reversing degradation'. | Should be 'reverse proxying Codex causing a downgrade/shadowban' - more technically accurate |
| Qwen/Qwen3-14B | Titolo: Impostazioni CPA per la quantità di pensiero, chi ha paura di essere depotenziato guardi qui |
轻微
[准确性]
"depotenziato"
理由: Incomplete translation - only title provided, content missing. Also 'depotenziato' is awkward; should be 'downgraded/shadowbanned'. | The hypothesis omits the Content section. 'Depotenziato' is a weak translation for '降智' (downgraded intelligence/shadowbanned); 'Content' is also missing. | Translation is in Italian instead of Chinese (target language zh) | "降智" is about intelligence/performance degradation, sometimes like "dumbed down" or "nerfed"; "depotenziato" captures part of it but misses intelligence-specific nuance. |
| Qwen/Qwen3.5-4B | Title: Set thinking amount for CPA; those fearing nerfing should look here Content: Discussed issues regarding CPA reverse-substituting Codex getting nerfed; actually, the CPA documentation explains this very clearly. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Set thinking limit for CPA, check here if you fear nerfed; Content: There's discussion about CPA reverse proxy Codex being downgraded, but the CPA docs clarify this clearly. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: CPA Set Thinking Volume, Be Afraid of Shadowbanning Here Content: There's discussion about CPA anti-Codex shadowbanning issues. Actually, the CPA documentation explains it clearly. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: CPA Setting Thinking Volume, See Here if you're worried about being nerfed. Content: I saw discussions about Codex being nerfed through CPA proxies, but the CPA documentation explains this very clearly. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of 'CPA 设置思考量,怕降智的看这里' in en |
严重
[准确性]
"Translation of 'CPA 设置思考量,怕降智的看这里' in en"
理由: Hypothesis is not a translation but a meta-description. It should provide the actual English translation, not describe what should be translated. | No actual translation given; only repeats source plus meta instruction. | Hypothesis is not a proper translation - it's a placeholder text that doesn't translate the source content | The hypothesis is a meta-commentary describing a translation task rather than providing the actual translation. | The translation is incomplete and does not cover the entire source text. |
| google/gemma-3-4b-it | Title: Setting Thinking Volume for CPA, Those Afraid of Being Shadowbanned Should See Here |
轻微
[准确性]
"Those Afraid of Being Shadowbanned"
理由: Source includes ‘downgraded/shadowbanned’; half of nuance omitted. | '降智' literally means 'reduce intelligence' (model dumbing down). 'Shadowbanned' is a different concept. 'Downgraded' or 'lobotomized' is more accurate. | Mistranslation: '降智' means 'being downgraded' (in intelligence/capability), not 'shadowbanned' |
| google/translategemma-12b-it | Title: Configure the number of thoughts for CPA; see here if you are concerned about performance degradation. |
严重
[准确性]
"performance degradation"
理由: Should be 'downgraded/shadowbanned' to match the source meaning of '降智' (intelligence reduction/shadowban) | “降智” here means model intelligence degradation/dumbing down (or shadowban), not generic performance; mistranslation of key concern. | Should be 'Setting the reasoning effort in CPA. Look here if you are afraid of being downgraded/shadowbanned'. '降智' refers to shadowban/downgrade, not generic performance degradation. | '降智' (jiang zhi) is slang for 'reduced intelligence' or 'lobotomized' model behavior. 'Performance degradation' is too generic. |
| google/translategemma-4b-it | Title: Setting Thinking Quantity for CPA, Those Afraid of Being 'Shadowbanned' Should See Here Content: I saw discussions about Codex being 'shadowbanned' due to CPA reverse engineering, but actually, the CPA documentation clearly explains this. | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: CPA 设置思考量,怕降智的看这里' in en |
严重
[准确性]
"Translation of 'Title: CPA 设置思考量,怕降智的看这里' in en"
理由: Hypothesis is a placeholder string rather than an actual translation | Meta-text only; lacks translation of '思考量' (reasoning effort) and slang '怕降智的看这里' about fearing downgrade/shadowban or reduced intelligence. | The hypothesis is a meta-description string, not the actual translation. It fails to translate the source text. |
| tencent/HY-MT1.5-7B | "en": "Title: Setting Thinking Capacity for CPA; Those Afraid of 'Intellectual Decline' Should Read This", "zh-TW": "Title: 设置CPA的思考能力;害怕‘智力下降’的人请阅读此内容", "ja": "Title: CPAの思考能力設定;「知能低下」を恐れる方はこちらをご覧ください", "ko": "Title: CPA의 사고 능력 설정; ‘지능 저하’를 걱정하는 분들은 여기를 참고하세요", "de":="Titel: Einstellung der Denkfähigkeit für CPA; Diejenigen, die Angst vor einer „Verminderung der Intelligenz“ haben, sollten dies lesen", "fr":="Titre: Réglage de la capacité de réflexion pour CPA; Ceux qui craignent une “diminution de l’intelligence” doivent lire ceci", "it":="Titolo: Impostazione della capacità di pensiero per CPA; Chi teme una “riduzione dell’intelligenza” dovrebbe leggere questo"}|im_end|> |
严重
[准确性]
"those Afraid of 'Intellectual Decline'"
理由: The hypothesis provides multiple language translations when only English was requested. More critically, it mistranslates '降智' as 'Intellectual Decline' when the reference shows it should be 'downgrade/shadowbanned'. The content is completely missing. | "降智" here refers to model quality downgrade/shadowban when proxied, not literal intellectual decline of users; mistranslation of key concept. | Mistranslation of slang '降智' (jiang zhi). In the context of AI models, this refers to the model becoming 'stupider' or 'downgraded' in performance, not a human's intellectual decline. | The hypothesis contains multiple language translations mixed together instead of providing a single target translation. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: I've been inactive for a long time, I adjusted some things at the free API site | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: I've been inactive for a while, and the free API site has made some adjustments. | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Haven't posted in a long time, made some adjustments on the free shared API site Content: Upgraded to GPT-5.4 model and switched groups; the original kyx group has been changed to codex group... As donors had already switched from kyx to cat大佬 (cat laoda) early on. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Haven't posted in a while, adjusted some things on the public proxy Content: Upgraded to the gpt-5.4 model, then reorganized the groups, changing the kyx group to the codex group... Because the channel donors switched from kyx to CatLord long ago | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Non ho pubblicato da molto tempo, il sito gratuito ha apportato alcune modifiche |
轻微
[准确性]
"il sito gratuito"
理由: "公益站" specifically implies public-welfare/free proxy; "sito gratuito" is close but omits public-welfare/community nuance. | Incomplete translation - only title provided, content missing. | The hypothesis omits the Content section which details specific changes (GPT-5.4, group renaming, donor change). | Translation is in Italian instead of Chinese (target language zh) |
| Qwen/Qwen3.5-4B | Title: Haven't posted in a while; adjusted some things on the free shared API site Content: Upgraded to the gpt-5.4 model and switched groups—changed from kyx group to codex group... since long ago channel donors had already switched from kyx to catme | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Haven't posted in a while, made some adjustments to the free shared API site Content: Upgraded to the gpt-5.4 model, then switched groups; changed the original kyx group to codex... because channel donors had already switched from kyx to CatMeister long ago. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: It's been a while since I posted. The public proxy has made some adjustments. | 通过 (无共识错误) |
| google/gemma-3-12b-it | It's been a while since I posted. The public proxy has been adjusted. |
严重
[准确性]
"The public proxy"
理由: Missing translation of '公益站' - should be 'free API proxy site' not just 'public proxy'. Also missing specific model name 'gpt-5.4' and all details about grouping changes (kyx to codex, kyx to maomaolao) | "公益站" is a free/public-benefit API proxy site; "public proxy" omits the free/public-benefit aspect | Omission of the source content detailing the specific changes: adding gpt-5.4, changing groupings from kyx to codex, and the donor change. | Incomplete translation. Missing content about the model update and grouping changes. Should include details about gpt-5.4 model and group changes. | Missing content section which is present in the reference. |
| google/gemma-3-1b-it | Title: Good long time hasn't posted anymore, the free API site adjusted something. |
轻微
[流畅性]
"Good long time hasn't posted anymore, the free API site adjusted something."
理由: Severe grammatical error and unnatural phrasing. Should be 'Haven't posted in a long time'. | The title format is slightly off - missing the 'Content:' label that appears in reference | Unnatural and ungrammatical English; e.g., 'Good long time hasn't posted anymore' is incorrect. | Awkward phrasing. Should be 'Haven't posted in a long time' or 'It's been a long time since I last posted'. The current phrasing is unnatural. |
| google/gemma-3-4b-it | Title: It’s been a long time since I posted. The free shared API site has made some adjustments. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: It's been a while since I posted anything; the public API site has made some adjustments. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: It's been a long time since I posted. The free shared API site has been adjusted. |
轻微
[术语]
"free shared API site"
理由: “公益站” is more precisely a free API proxy site/public-good site; “shared API site” is slightly vague. | Should be 'free API proxy site' to match reference terminology | Should be 'the free API proxy site' which is more specific terminology used in the reference. |
| tencent/HY-MT1.5-1.8B | Translation of '好久没发过帖子了,公益站调整了一下东西' in en |
严重
[准确性]
"Translation of '好久没发过帖子了,公益站调整了一下东西' in en"
理由: No English rendering; should convey 'Haven't posted in a long time, tweaked some things on the free API proxy site'. | Hypothesis is a placeholder string rather than an actual translation | The hypothesis is a meta-description string, not the actual translation. It fails to translate the source text. |
| tencent/HY-MT1.5-7B | Title: It's been a while since I last posted something on the public proxy site; the settings have been adjusted. |
轻微
[准确性]
"It's been a while since I last posted something on the public proxy site; the settings have been adjusted."
理由: Content section was completely omitted from translation - the detailed changes (gpt-5.4 model, group changes from kyx to codex) were not translated. | Loses detail that it's specifically a free API proxy site and omits mention of gpt-5.4 model and group changes. | Slight deviation. '公益站' is better translated as 'free API proxy site' to match the specific community terminology used in other items. | The hypothesis only provides the title translation and completely omits the content. The reference includes both title and content translations. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: codex, What tools do you guys use for managing your account pools? Also, registration machines? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: codex, What tools do you guys use for managing your account pools? Also, what about registration machines? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: codex, what account pool management tools are you guys using? Also, registration machines Content: What account pool management tools are you guys using? Also, registration machines, so many people use the latest codex | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: codex, what account pool management tools do you guys use? Also, any registration keys? |
严重
[准确性]
"registration keys"
理由: Mistranslation: 'registration keys' should be 'account generators' - '注册机' refers to tools that generate accounts, not keys. | Mistranslation of '注册机' (registration machine/generator) as 'registration keys'. A 'machine' implies a tool/script, not just the keys themselves. | “注册机” means an account generator/registration bot, not registration keys. | Should be 'account generators' not 'registration keys' - different meaning |
| Qwen/Qwen3-14B | codex |
致命
[准确性]
"codex"
理由: Incomplete translation - only one word provided. Should include full title and content translation. | The hypothesis is a severe truncation, providing only the word 'codex' and omitting the entire Title and Content. | Translation is incomplete - only contains 'codex' instead of full Chinese translation | Missing key information about tools. | Hypothesis contains only the word "codex" and omits essentially all source information about tools and account generators. |
| Qwen/Qwen3.5-4B | Title: codex, What account pool management tools do you guys use? Also registration machines. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: codex, what account pool management tool do you guys use? Also registration bots. Content: What account pool management tools are you guys using? And registration bots. So many people are using the latest codex. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | What tool do you use for number pool management? And how about the registration machine? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: codex, what account pool management tool are you guys using, and also what registration machine? | 通过 (无共识错误) |
| google/gemma-3-1b-it | What is a token pool management tool that codex users use? |
严重
[准确性]
"What is a token pool management tool that codex users use?"
理由: Incomplete translation - missing the second part of the source about '注册机' (account generators). Also 'token pool' should be 'account pool' - the source refers to account management, not tokens | Mistranslation of '号池' (account pool). The hypothesis changes it to 'token pool', which refers to a different technical concept. | Incomplete and inaccurate. Should be 'What account pool management tools are you bros using, and account generators?' Missing 'account' and 'generators' components. | Source asks about 'account pool management tools' and 'account generators', not 'token pool'. |
| google/gemma-3-4b-it | Title: codex, what pool management tools do you guys use, and also registration keys? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: codex, what account pool management tools are you guys using? Also, what registration tool? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: codex, what account pools do you guys use to manage? Also, what about registration tools? |
轻微
[流畅性]
"what account pools do you guys use to manage?"
理由: Slight grammatical awkwardness; understandable but not idiomatic. | Awkward syntax. 'What account pool management tools are you using?' is better. | Should be 'what account pool management tools are you bros using' - more natural phrasing and includes 'management tools'. |
| tencent/HY-MT1.5-1.8B | Title: Codex, what kind of account pool management tool do fellow gamers use? Also, registration machines and the latest Codex that many people use. | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Title: What number pool management tools do friends use? Also, what about registrars? |
严重
[准确性]
"What number pool management tools do friends use? Also, what about registrars?"
理由: Content section was completely omitted - the second sentence 'Latest codex这么多人人用' (so many people using the latest Codex) was not translated. | The translation is incomplete and inaccurate. It omits the context of 'Codex' and mistranslates '注册机' as 'registrars' when it should be 'account generators'. The content is completely missing. | "号池" is account pool, and "注册机" is account generator; "number pool" and "registrars" are misleading, and Codex context is omitted. | Mistranslation of '号池'. In this context, it refers to an 'account pool' (of login credentials), not a pool of telephone numbers. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: I don't know how my cc became like this? Help | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: I don't know how my cc became like this? Help | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: My cc somehow turned out like this? Help needed Content: Help, help, it looks so silly now. It seems I just let it clear its cache by itself. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: My cc is now like this? Help needed Content: Help, help, it looks so silly. I think I just let it clean up the cache by itself. |
轻微
[准确性]
"it looks so silly"
理由: Minor nuance: reference says 'it became so stupid' which is closer to '变傻' (became stupid) vs 'looks so silly'. | '傻' in this context (AI behavior) usually means 'stupid' or 'dumb' (low performance), whereas 'silly' implies playful foolishness. 'Stupid' is more accurate. | “变得好傻” is closer to “became so dumb/stupid” (re performance/intelligence), not just appearance. |
| Qwen/Qwen3-14B | Il mio cc non sa come fare e è diventato così? Aiuto necessario |
严重
[准确性]
"non sa come fare"
理由: Source means "I don't know how it became like this"; hypothesis says "my cc does not know what to do", changing subject and meaning. | The hypothesis omits the Content section. The translation of the title is awkward ('non sa come fare' implies the CC doesn't know how to do something, rather than the user not knowing what happened to the CC). | Incomplete translation - only title provided, content missing. Also awkward phrasing; should be 'I don't know how my Claude Code ended up like this?' | Translation is in Italian instead of Chinese (target language zh) |
| Qwen/Qwen3.5-4B | Title: My cc doesn't know how to do this and has become like this? Asking for help Content: Asking for help, asking for help, becoming so stupid, I seem to have just let it clean up its cache. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: My cc somehow turned out like this? Help Content: Help, help, it looks so silly now. I think I just let it clear its cache. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | My cc doesn't seem to know how to fix itself. I'm getting really confused. |
严重
[准确性]
"My cc doesn't seem to know how to fix itself. I'm getting really confused."
理由: Adds self-repair notion not present and misses key complaint that Claude Code became dumb after clearing cache | Mistranslation: Should be 'My Claude Code ended up like this' | Completely misinterprets '不知道咋弄的变成这样了' (don't know how it ended up like this) as 'doesn't know how to fix itself'. | Incorrect translation. Should be 'I don't know how my Claude Code ended up like this? Please help' |
| google/gemma-3-12b-it | Title: My cc doesn't know how to do it and turned out like this? Help Content: Help, help, it became so stupid. I think I just let it clear its cache. | 通过 (无共识错误) |
| google/gemma-3-1b-it | Help! I don't know how to turn my cc into this messed up state. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: My cc is broken and showing this issue. Please help! | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: My cc is messed up and I don't know how to fix it? Help! | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: My cc is not working properly and has become like this? Please help. |
轻微
[准确性]
"is not working properly"
理由: Source emphasizes becoming ‘stupid’ rather than generically malfunctioning; some nuance lost but main help request is intact. | Should be 'ended up like this' to match the reference. | Vague translation of '变成这样' (became like this) and '好傻' (so stupid/dumb). The reference captures the 'stupid' nuance better. | Source asks about a specific state (became stupid), not just 'not working properly' |
| tencent/HY-MT1.5-1.8B | Translation of '我的cc不知道咋弄的变成这样了?求助' in English |
严重
[准确性]
"Translation of '我的cc不知道咋弄的变成这样了?求助' in English"
理由: No actual translation given; should express confusion about how Claude Code ended up like this and a request for help. | Hypothesis is a placeholder string rather than an actual translation | The hypothesis is a meta-description string, not the actual translation. It fails to translate the source text. |
| tencent/HY-MT1.5-7B | "en": "Help, help! My cc has changed into something so strange. I think I just let it clean its cache automatically.", "zh-TW": "幫助,幫助!我的cc變得這麼奇怪。我想我只是讓它自動清除了緩存吧。", "ja": "助けて、助けて!私のccがこんなに変わってしまった。たぶん自動的にキャッシュを削除してしまったみたい。, "ko": "도와주세요, 도와주세요! 제 cc가 이렇게 이상해졌어요. 아마 자동으로 캐시를 삭제한 것 같아요., "de": "Hilfe, Hilfe! Mein cc hat sich so seltsam verändert. Ich glaube, ich habe es einfach automatisch auf seine Cache gelöscht lassen.", "fr": "Aidez-moi, aidez-moi ! Mon cc a changé de manière si étrange. Je pense que j’ai simplement laissé faire l’opération de nettoyage du cache automatiquement.", "it": "Aiuto, aiuto! Il mio cc è diventato così strano. Credo di aver semplicemente lasciato che facesse automaticamente il pulizia del cache." |
轻微
[准确性]
"changed into something so strange"
理由: Reference emphasizes it "became so stupid" (behavioral degradation), not just strange appearance; nuance is softened. | The hypothesis contains multiple language translations mixed together instead of providing a single target translation. | Nuance loss. '好傻' means 'so stupid' or 'dumb', which is more specific than 'strange' in the context of AI behavior. | The hypothesis provides multiple language translations when only English was requested. The title translation is also incomplete - it should specify 'Claude Code' instead of just 'cc'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is the公益站GPT from Soda down? Content: Suddenly, GPT-5.4 stopped working and got stuck at 3:00 PM, everyone else also experienced this issue. |
严重
[准确性]
"Soda"
理由: Incorrect translation; should be 'Is the GPT on Xiaodai's free API proxy site down?' - '小呆' is a name (Xiaodai), not 'Soda' | Missing translation of Content: 'Suddenly GPT-5.4 is unusable and stuck at 15:00. Is everyone stuck?' | "小呆" refers to Xiaodai, not Soda; wrong proper noun. | Mistranslation of '小呆' (Xiaodai, a name/nickname) as 'Soda'. |
| CohereLabs/tiny-aya-water | Title: Is the Smaller Goodwill Station GPT crashing? |
轻微
[术语]
"Smaller Goodwill Station"
理由: The translation uses 'Smaller Goodwill Station' instead of 'Xiaodai's free API proxy site' and 'crashing' instead of 'down', which is slightly less accurate. | Should be 'Xiaodai's free API proxy site' to match reference; 'Smaller Goodwill Station' is awkward translation | Misrenders "小呆" which is a username/handle, not "Smaller Goodwill Station"; also "公益站" is free/public-benefit site, not "Goodwill Station". |
| Qwen/Qwen2.5-14B-Instruct | Title: Is XiaoDai's Free Shared API Site GPT Down? Content: Suddenly GPT-5.4 is not working, stuck at 15:00, is everyone experiencing issues? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Free Shared API Site GPT Is Down Content: Suddenly GPT-5.4 is unavailable, stuck at 15:00. Is everyone experiencing this? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: Is Xiaodai Public Proxy GPT Down? Content: Suddenly, GPT-5.4 can't be used and is stuck at 15 points. Are everyone else having the same issue? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Is LittleDai Free Shared API Site GPT down? Content: Suddenly GPT-5.4 is stuck at 15 points, is everyone experiencing issues? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Is Xiao Dai Public Proxy GPT down? Content: Suddenly GPT-5.4 stopped working at 15:00, is everyone stuck? |
轻微
[术语]
"Xiao Dai Public Proxy GPT"
理由: 参考用“Xiaodai's free API proxy site”,当前“Public Proxy GPT”略不自然且未体现“site/站点”。 | Should be 'Xiaodai's free API proxy site' or '小呆公益站' | Hypothesis uses 'Xiao Dai Public Proxy GPT' but reference uses 'Xiaodai's free API proxy site' - reference is more natural |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Free Shared API Site GPT Is Down? Are you all stuck? |
轻微
[流畅性]
"Are you all stuck?"
理由: Slightly informal but acceptable; meaning matches source | Should be 'Is the GPT on Xiaodai's free API proxy site down?' for better clarity and specificity | A bit abrupt, but acceptable. 'Is everyone experiencing this?' might be smoother. |
| google/gemma-3-12b-it | Title: Xiaodai Public Proxy GPT is it down? Content: Suddenly GPT-5.4 stopped working and is stuck at 15:00, is everyone experiencing issues? | 通过 (无共识错误) |
| google/gemma-3-1b-it | The GPT-5.4 model is stuck at 15 points and everyone is stuck? |
严重
[准确性]
"15 points"
理由: Inaccurate translation. '15点' means '15:00' (3 PM), not '15 points'. Should be 'stuck at 15:00' or 'stuck since 3 PM'. Missing context about the service being down. | '15点' refers to 15:00 (3 PM) time, not '15 points'. | Mistranslation of '15 点'. In this context, it refers to the time '15:00' (3 PM), not '15 points' (score). |
| google/gemma-3-4b-it | Is Little Dai Public API Site GPT down? |
轻微
[术语]
"Little Dai Public API Site"
理由: Should be 'Xiaodai's free API proxy site' to match reference terminology. | Inconsistent translation: '小呆公益站' should be 'Xiaodai's free API proxy site' - 'Little Dai' is a nickname, should be transliterated or explained | ‘小呆公益站’ is better as ‘Xiaodai’s free API proxy site’; ‘Public API Site’ loses ‘free/public welfare’ nuance. |
| google/translategemma-12b-it | Title: Is Xiaodai Public Proxy GPT Down? Content: Suddenly, GPT-5.4 is not working and stuck at 15:00. Are you all experiencing issues? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Is the Xiaodai Public API Site GPT down? |
轻微
[术语]
"Public API Site"
理由: “公益站” is specifically a free/public-good proxy site; “Public API Site” may miss the ‘free proxy’ nuance but is close. | Should be 'free API proxy site' to match the reference. | Should be 'Is the GPT on Xiaodai's free API proxy site down?' - more precise terminology and better structure. |
| tencent/HY-MT1.5-1.8B | Translation of '小呆公益站GPT 是挂了吗' in en |
严重
[准确性]
"Translation of '小呆公益站GPT 是挂了吗' in en"
理由: Hypothesis is a placeholder string rather than an actual translation | Meta-phrase only; should translate the site name and ask whether its GPT service is down. | The hypothesis is a meta-description string, not the actual translation. It fails to translate the source text. |
| tencent/HY-MT1.5-7B | Title: Is the Xiao Dai Public Welfare Site GPT down? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: The Dragon Crab is out of the bag today, two colleagues asked me about openclow |
严重
[准确性]
"The Dragon Crab"
理由: Source says '龙虾' (lobster/crayfish) which should be translated as 'OpenClaw' based on context and reference, not 'Dragon Crab' | Mistranslation: '龙虾' should be 'OpenClaw' (a software name), not 'Dragon Crab'. '火出圈了' means 'gone mainstream/popular', not 'out of the bag'. 'openclow' is a typo of 'OpenClaw'. | Mistranslation of “龙虾” which in this context refers to OpenClaw; also “crab” instead of “lobster” and misses product name nuance. | Incorrect translation of '龙虾火出圈了' as 'The Dragon Crab is out of the bag' instead of 'OpenClaw has gone mainstream' | Mistranslation of slang '火出圈' (gone mainstream/viral) as literal 'out of the bag'. '龙虾' (Lobster) is a slang term for 'OpenClaw' and should not be translated literally as 'Dragon Crab'. 'openclow' is a typo for 'OpenClaw'. Content is missing. |
| CohereLabs/tiny-aya-water | Title: The Dragon Crab is out of the bag today, two colleagues asked me about openclow |
严重
[准确性]
"The Dragon Crab is out of the bag"
理由: Mistranslation of the product name and idiom: should be about OpenClaw going mainstream, not a literal dragon crab or ‘out of the bag’. | Mistranslation of '龙虾' (lobster/OpenClaw). Should be 'OpenClaw' not 'Dragon Crab'. The source uses '龙虾' as slang for OpenClaw AI model. | Mistranslation of slang '火出圈' (gone mainstream/viral) as literal 'Dragon Crab' and incorrect idiom 'out of the bag'. Missed context that '龙虾' refers to OpenClaw. | Mistranslation: '龙虾' should be 'OpenClaw' (a software), not 'Dragon Crab'; '火出圈了' means 'gone mainstream/popular', not 'out of the bag'; 'openclow' is misspelled, should be 'OpenClaw' | The translation of '龙虾火出圈了' as 'The Dragon Crab is out of the bag' is slightly off. It should be 'The Dragon Crab has gone mainstream' or similar. |
| Qwen/Qwen2.5-14B-Instruct | Title: The lobster has gone viral, two colleagues approached me about openclow today Content: It's really puzzling, is there some mysterious force driving this? |
严重
[准确性]
"The lobster has gone viral"
理由: Misinterprets the metaphorical ‘龙虾’ which in this context refers to OpenClaw rather than literal lobster. | Mistranslation of slang '龙虾' which refers to the software 'OpenClaw', not the animal. | Should be 'OpenClaw has gone mainstream' - 'lobster' is slang metaphor that should be translated as the actual product name | Incorrect translation: '龙虾' should be 'OpenClaw' (a software name), not literal 'lobster' |
| Qwen/Qwen2.5-7B-Instruct | Title: Lobster is Making Waves Today, Two Colleagues Chatted with Me About OpenClaw Content: It’s truly perplexing, is there any mysterious force driving this? |
严重
[准确性]
"Lobster is Making Waves"
理由: Should be 'OpenClaw has gone mainstream' not 'Lobster is Making Waves Today'. The source uses '龙虾' (lobster) as slang for OpenClaw, but the reference clarifies this should be translated as 'OpenClaw' in English. | Misinterprets “龙虾火出圈了” which refers to OpenClaw (slang ‘lobster’) becoming popular, not literal lobster making waves. | Incorrect translation of '龙虾' - should be 'OpenClaw' (a software name), not 'lobster' (food). The source uses '龙虾' as a slang/abbreviation for OpenClaw. | Mistranslation of slang '龙虾' (Lobster) as the animal instead of the project name 'OpenClaw'. '火出圈' is translated as 'Making Waves' which is acceptable but less precise than 'gone mainstream'. 'openclow' typo in source ignored, but 'OpenClaw' in hypothesis is correct contextually, however the title structure implies the animal is making waves. |
| Qwen/Qwen3-14B | Il granchio è diventato virale, oggi due colleghi mi hanno parlato di openclow |
严重
[准确性]
"Il granchio è diventato virale"
理由: Misinterprets “龙虾” (lobster / OpenClaw) as “granchio” (crab) and omits the OpenClaw naming pun from both title and content. | Source says '龙虾' (lobster/OpenClaw), not '螃蟹' (crab). Hypothesis translates to Italian instead of target language Chinese. | Mistranslation: '龙虾' (lobster/OpenClaw) should be 'OpenClaw' or 'Lobster (OpenClaw)', not 'granchio' (crab). The source refers to the software OpenClaw, not actual lobster or crab. | Mistranslation of slang '龙虾' (Lobster/OpenClaw) as 'crab' instead of 'lobster' or the specific tool name. |
| Qwen/Qwen3.5-4B | Title: Lobster has gone viral; today two colleagues asked me about openclow Content: It's truly baffling, what mysterious force is driving this? |
严重
[准确性]
"Lobster has gone viral"
理由: Misinterprets the metaphorical use of “龙虾” which here refers to OpenClaw, not literal lobster. | Incorrect translation: 'openclow' should be 'OpenClaw' (the software name). Also 'Lobster' is incorrect - the source uses '龙虾' which is a slang term for OpenClaw, not literal lobster. | Mistranslation of slang '龙虾' which refers to the software 'OpenClaw', not the animal.
严重
[准确性]
"openclow"
理由: Failed to correct the typo in the source ('openclow' should be 'OpenClaw') and retained the error. | Should be 'OpenClaw' (capitalized correctly). The source shows 'openclow' which is a misspelling that should be corrected to the proper product name | Misspelling of product name OpenClaw leads to incorrect terminology. |
| Qwen/Qwen3.5-9B | Title: Lobsters have gone viral today; two colleagues asked me about OpenClow Content: It's truly baffling, is there some mysterious force behind this? |
严重
[准确性]
"Lobsters"
理由: Mistranslation of slang term '龙虾' which refers to the software 'OpenClaw', not the animal. | Incorrect translation: 'OpenClow' should be 'OpenClaw' (typo) | Incomplete translation. Source says '龙虾火出圈了' which means 'Lobsters have gone mainstream/viral' but the hypothesis adds 'today' which is not in source. Reference correctly translates as 'OpenClaw has gone mainstream' | Misinterprets the metaphorical use of “龙虾/火出圈/openclow(OpenClaw)” as literal lobsters instead of the OpenClaw project, changing the topic. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Clam has gone viral, my two colleagues found me to chat about openclow today. |
轻微
[准确性]
"Clam"
理由: Misinterprets “龙虾” which here refers to the OpenClaw concept, not clams. | Source says '龙虾' (lobster/shrimp), not clam. Should be 'Lobster' or 'OpenClaw' based on context | Mistranslation: '龙虾' (lobster/shrimp) was incorrectly translated as 'Clam' (蛤蜊). The reference correctly uses 'OpenClaw'. | Mistranslation of slang '龙虾' (Lobster) which refers to 'OpenClaw'. Translating as 'Clam' loses the specific reference and meaning.
严重
[准确性]
"openclow"
理由: Misspelling of the product/name “OpenClaw” and omission of its significance. | Incorrect translation: 'openclow' should be 'OpenClaw'. Also, the phrase 'found me to chat' is awkward and doesn't match the reference 'came to talk to me'. | Typo in the proper noun 'OpenClaw' (source says 'openclow' likely as a typo/slang, but reference corrects it; hypothesis retains the error or fails to recognize the entity). |
| google/gemma-3-12b-it | Title: Lobster is trending, two colleagues asked me about openclow today Content: It's really baffling, what mysterious force is driving this? |
严重
[准确性]
"openclow"
理由: Mistranscription of the product name; should match OpenClaw/OpenClaw-related pun from source/reference. | Should be 'OpenClaw' (capitalized correctly). The hypothesis preserves the lowercase typo from source instead of correcting to proper terminology | Failed to correct the typo in the source 'openclow' to the intended entity 'OpenClaw'. | Mistranslation: 'openclow' should be 'OpenClaw' (the software name)
严重
[准确性]
"Lobster is trending"
理由: Mistranslation: 'Lobster' should be 'OpenClaw' (the software name, not a literal translation of the Chinese term) | Source implies OpenClaw (lobster as nickname) going viral, not lobsters in general trending; product reference is lost. | Mistranslation of slang '龙虾' which refers to the software 'OpenClaw', not the animal. |
| google/gemma-3-1b-it | The shellfish fire out of the circle today, two colleagues asked me to talk about openclow. |
严重
[准确性]
"The shellfish fire out of the circle"
理由: Incorrect translation of '龙虾火出圈了'. Should be 'OpenClaw has gone mainstream' not 'shellfish fire out of the circle'. '龙虾' is slang for OpenClaw, not literal shellfish. | Complete mistranslation - '龙虾' (lobster/OpenClaw) was translated as 'shellfish fire', '火出圈' (go viral) was incorrectly translated as 'fire out of the circle', '神秘力量' was omitted entirely | Incorrect translation of '龙虾' as 'shellfish' instead of 'lobster'. | Literal translation of slang '火出圈' (gone viral/mainstream) and '龙虾' (OpenClaw codename) results in nonsensical meaning. Failed to translate the Title content properly. | Mistranslation of slang "龙虾火出圈了"; should convey that OpenClaw (lobster) has become very popular or gone viral, not literal shellfish and fire out of the circle.
严重
[准确性]
"openclow"
理由: Typo in proper noun 'OpenClaw'. Also failed to translate the Content sentence. | Should be 'OpenClaw' (proper capitalization), not 'openclow' | Name is misspelled; should be OpenClaw or equivalent. |
| google/gemma-3-4b-it | Lobster craze is going viral! Two colleagues found me to chat about openclow today. |
严重
[准确性]
"Lobster craze is going viral!"
理由: The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). Additionally, 'openclow' is a mistranscription of the slang term '龙虾' (Lobster/OpenClaw). | Missing context about 'openclow' being a product/tool name (OpenClaw). The hypothesis treats it as lowercase 'openclow' while reference shows it should be 'OpenClaw'. Also 'found me to chat' is awkward; should be 'came to talk to me'. | Misinterprets the title; "龙虾" here is a slang reference to OpenClaw, not literal lobsters. |
| google/translategemma-12b-it | Title: Lobster is trending, two colleagues asked me about openclow today. |
严重
[准确性]
"openclow"
理由: Incorrect translation: 'openclow' should be 'OpenClaw' (the software name). Also missing the Content part entirely. | Mistranscription of the product name; should match OpenClaw/ArkClaw-related term as in reference. | Failed to correct the typo in the source ('openclow' should be 'OpenClaw') and retained the error. | Should be 'OpenClaw' not 'openclow'. The reference uses the correct capitalization. |
| google/translategemma-4b-it | Lobster craze is trending! Two colleagues asked me about openclow today. |
严重
[准确性]
"Lobster craze is trending!"
理由: The hypothesis translates only the Title and omits the Content entirely. It also mistranslates 'openclow' (a typo for OpenClaw) literally instead of recognizing the entity, and misinterprets '火出圈了' as 'craze is trending' rather than 'gone mainstream'. | Missing title translation and incomplete content translation. Should include both title and content sections. 'openclow' should be 'OpenClaw' (proper noun capitalization). | Should be 'OpenClaw' to match the reference. | Misunderstanding of source - '龙虾' is slang for OpenClaw (AI agent), not literal lobster; '火出圈了' means 'gone mainstream/viral', not 'craze trending' | Source title refers to OpenClaw (a model) via the lobster meme; the main topic ‘OpenClaw’ is omitted and lobster is literalized instead of mapped to OpenClaw. |
| tencent/HY-MT1.5-1.8B | The term '龙虾' has become popular among the LinuxDo community. Today, two colleagues approached me to discuss OpenCL. |
严重
[准确性]
"The term '龙虾' has become popular among the LinuxDo community."
理由: Source mentions '同事' (colleagues), not 'LinuxDo community'. The hypothesis incorrectly identifies the context. | Failed to translate the content sentence ('真是让人费解...'). Misinterpreted '龙虾' (slang for OpenClaw) as literal lobsters and 'openclow' as 'OpenCL' instead of the specific project name 'OpenClaw'. | Mistranslation: '龙虾' (lobster) should be 'OpenClaw' (a project name), not literal lobster. The source refers to OpenClaw, not a LinuxDo community. | Mistranslates '龙虾火出圈了' (OpenClaw went viral) and omits the puzzlement question; shifts focus from OpenClaw hype to generic popularity of the word '龙虾'. |
| tencent/HY-MT1.5-7B | "en": "Title: Lobster has gone viral; today two colleagues came to talk to me about OpenCLow.", "zh-TW": "Title: 龙虾火了;今天有兩位同事來找我談論OpenCLow。", "ja": "Title: ロブスターが話題になっている;今日、二人の同僚がOpenCLowについて話しに来た。", "ko": "Title: 랍스터가 화제가 되고 있습니다; 오늘 두 명의 동료가 OpenCLow에 대해 이야기하러 왔습니다.", "de":="Titel: Der Lobster ist in aller Munde; heute kamen zwei Kollegen, um mit mir über OpenCLow zu sprechen.","fr":"Titre: Le homard est devenu très populaire ; aujourd’hui, deux collègues sont venus me parler d’OpenCLow.","it":"Titolo: Il gambero è diventato molto di tendenza; oggi due colleghi sono venuti a parlarmi di OpenCLow."} |
严重
[准确性]
"龙虾火了;今天有兩位同事來找我談論OpenCLow。"
理由: The source text contains a typo 'openclow', but the reference corrects it to 'OpenClaw'. The hypothesis retains the typo or fails to normalize the entity name consistent with the domain context. | Should be 'OpenClaw' not 'OpenCLow'. The reference clearly shows 'OpenClaw' as the correct term. | “龙虾”直译为动物而非指代 OpenClaw/ArkClaw 系列,比参考中的 OpenClaw 偏离;且未翻译内容句“真是让人费解,有什么神秘力量推动吗?” |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Black and White公益站Claude model usage frequently reports errors, problem assistance Content: I used the claude-sonnet-4.6 model from the Black and White公益站, as well as opus-4.6. Normal chat works fine, but when it comes to tool use, it seems to directly crash. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Black and White公益站Claude model frequently reports errors during usage, problem assistance | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Black and White Free Shared API Site Claude Model Frequently Crashes During Use, Need Help Content: I am using the claude-sonnet-4.6 model from the Black and White Community Site, also tried opus-4.6. Normal chatting works fine, but when it comes to tool usage, it just crashes. |
轻微
[术语]
"Black and White Free Shared API Site"
理由: Reference uses 'Black and White free API proxy site' which is more accurate than 'Free Shared API Site' | Source refers to ‘公益站’; ‘Free Shared API Site’ is understandable but deviates from reference’s more natural ‘free API proxy site’. | Missing 'proxy' in the translation | Inaccurate translation of '公益站' in this context; 'free API proxy site' (as in reference) is more precise for the community slang. | Inconsistent translation: '公益站' should be 'free API proxy site', not 'Free Shared API Site' |
| Qwen/Qwen2.5-7B-Instruct | Title: Black and White Public Proxy Claude Model Frequently Reports Errors during Use, Seeking Help Content: I have tried the black and white public proxy Claude-sonnet-4.6 model and opus-4.6 as well. Everything is fine with normal chatting, but when it comes to using tools, it seems to crash directly. |
轻微
[术语]
"Black and White Public Proxy Claude Model"
理由: Should be 'Black and White free API proxy site' not 'Black and White Public Proxy'. The reference uses more specific terminology. | Source means “黑与白公益站” = free/public API proxy site; “Public Proxy” is understandable but less precise than ‘free API proxy site’ in this context. | '公益站' is better translated as 'free API proxy site' or 'community station' rather than 'Public Proxy', which sounds more corporate. The reference uses 'free API proxy site'. | Inconsistent translation of '公益站' - reference uses 'free API proxy site', hypothesis uses 'Public Proxy' which is less accurate. Also 'Black and White' should be translated more naturally. |
| Qwen/Qwen3-14B | Sito proxy gratuito "Nero e Bianco" Modello Claude errori frequenti durante l'utilizzo, richiesta di aiuto per i problemi |
严重
[准确性]
"Sito proxy gratuito "Nero e Bianco" Modello Claude errori frequenti durante l'utilizzo, richiesta di aiuto per i problemi"
理由: Condenses full post into a title-like fragment and omits the detailed description about models used (claude-sonnet-4.6, opus-4.6), normal chatting working, and crashes on tool use. | Omission of the entire 'Content' section of the source text. | Hypothesis is in Italian, but target language is Chinese (zh). Should be translated to Chinese. |
| Qwen/Qwen3.5-4B | Title: Black & White Free Shared API Site Claude Model Frequently Reports Errors During Use, Seeking Help Content: I have tried the claude-sonnet-4.6 model from the Black & White free shared API site and also opus-4.6. Normal chatting works fine, but when involving tool use, it seems to crash directly. |
轻微
[术语]
"Black & White Free Shared API Site"
理由: Non-standard rendering of “黑与白公益站”, which is better translated as “Black and White free API proxy site”; meaning remains clear. | Translation is overly literal. 'Free Shared API Site' is verbose; reference uses 'free API proxy site' which is more natural. | Reference uses 'Black and White free API proxy site' which is more accurate terminology than 'Free Shared API Site' |
| Qwen/Qwen3.5-9B | Title: Black & White Free Shared API Site Claude Model Frequently Errors During Use, Seeking Help with Issue Content: I tried using the claude-sonnet-4.6 and opus-4.6 models from the Black & White free shared API site. Everything works fine for normal chatting, but when it comes to tool use, it seems to crash directly. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Black and White Public Proxy Assistance with Claude Model Usage Issues |
严重
[准确性]
"Black and White Public Proxy Assistance"
理由: The hypothesis omits the core issue 'Frequent errors' and the intent 'asking for help' (问题求助), changing the title from a help request to a generic statement. | Misrenders “黑与白公益站” which refers to a specific free API proxy/site, not generic “public proxy assistance” and omits the site aspect. | Missing key information about 'frequent errors' and 'asking for help' from source title |
| google/gemma-3-12b-it | Title: Frequent Errors When Using Claude Model on Black and White Public Proxy Site, Seeking Help Content: I am using the claude-sonnet-4.6 model from the Black and White public proxy site, and I have also tried opus-4.6. Everything is fine with normal chat, but when it comes to tool use, it seems to crash directly. |
轻微
[术语]
"Black and White public proxy site"
理由: Reference specifies it is a free API proxy site; omitting “free API” slightly weakens the technical nuance. | Missing 'API' in 'public proxy site' compared to reference | Should be 'Black and White free API proxy site' to match reference terminology | Should be 'Black and White free API proxy site' to match reference terminology more precisely |
| google/gemma-3-1b-it | The Claude model used with the black and white公益站 is experiencing frequent errors during tool use. The problem is that it just crashes directly. |
严重
[准确性]
"公益站"
理由: Term indicating a free/public-benefit proxy/API site is omitted; translation misses that this is a free/public service site. | Incomplete translation - '公益站' (free API proxy site) was left untranslated, 'frequent errors' is better than reference but 'during tool use' is accurate | Failed to translate '公益站' (free API proxy site/public welfare station), leaving Chinese characters in English text. |
| google/gemma-3-4b-it | Title: Black & White Public API Site Claude Model Frequent Errors - Seeking Assistance |
严重
[准确性]
"Black & White Public API Site Claude Model Frequent Errors - Seeking Assistance"
理由: Incorrect translation: 'Black & White' should be 'Black and White' (free API proxy site name); 'Public API Site' is awkward | The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). The content section is also missing entirely. | Incomplete translation - only translates the title, missing the content entirely. Also 'Public API Site' is less accurate than 'free API proxy site'. |
| google/translategemma-12b-it | Title: Black and White Public Proxy - Frequent Errors When Using Claude Model, Seeking Help Content: I am using the claude-sonnet-4.6 model from the Black and White public proxy, and I have also tried opus-4.6. Everything is fine with normal chat, but when it comes to tool usage, it seems to crash directly. |
轻微
[术语]
"Black and White Public Proxy"
理由: Reference uses 'Black and White free API proxy site' which is more specific and accurate than 'Public Proxy'. | Should be 'Black and White free API proxy site' | Less precise than "free API proxy site"; could cause misunderstanding about the service type. |
| google/translategemma-4b-it | Title: Frequent errors when using Black & White Public API Claude model, seeking help |
轻微
[术语]
"Black & White Public API Claude model"
理由: Should be 'Black and White free API proxy site' or 'Black and White public proxy site' for better accuracy and consistency with reference. | Should be 'free API proxy site' to match the reference. | Inconsistent translation - reference uses 'free API proxy site', hypothesis uses 'Public API' which loses the 'free' meaning | ‘公益站’ is closer to ‘free site’ or ‘public welfare site’; ‘Public API’ is somewhat misleading though gist is understandable. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of 'Title: 黑与白公益站Claude模型使用过程中频繁报错,问题求助' in en", "zh-TW": "Translation of 'Title: 黑與白公益站Claude模型使用過程中頻繁報錯,問題求助' in zh-TW", "ja": "Translation of 'Title: 黒と白の公益ステーションClaudeモデルの使用過程で頻繁にエラーが発生し、問題を相談する' in ja", "ko": "Translation of 'Title: 검은색과 흰색의 공익 스테이션 Claude 모델 사용 과정에서 자주 오류가 발생하여 문제를 도와드립니다' in ko", "de": "Translation of 'Title: Schwarze und weiße Gemeinschaftsstation Claude-Modell während des Einsatzes häufig Fehler auftreten, Probleme um Hilfe bitten' in de", "fr": "Translation of 'Title: Station publique noire et blanche Claude utilisant le modèle Claude-Sonnet 4.6, opus 4.6 également testé. Il n'y a aucun problème avec la conversation normale, mais lorsqu'il s'agit d'utiliser l'outil, il semble que cela échoue directement.' in fr", "it": "Translation of 'Title: Modello Claude-Sonnet 4.6 e opus 4.6 utilizzato nella stazione pubblica nera e bianca. Anche i modelli Claude-Sonnet 4.6 e Opus 4.6 sono stati provati. Non ci sono problemi con le conversazioni normali, ma quando si tratta di utilizzare l'strumento, sembra che si verifichi un fallimento diretto.' in it" }, {" " : "" } | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: Frequent Errors Occuring While Using the Claude Model on the Black and White Public Welfare Site - Seeking Help", "zh-TW": "标题:在黑白公益网站上使用Claude模型时频繁出现错误——寻求帮助", "ja": "タイトル:黒と白公益サイトでClaudeモデルを使用中に頻繁にエラーが発生する——助けを求めます", "ko": "제목: 흑백 공익사이트에서 Claude 모델 사용 중 자주 오류가 발생합니다 — 도움을 요청합니다", "de":="Titel: Häufige Fehler beim Einsatz des Claude-Modells auf der Black and White Public Welfare Website – Hilfe gesucht", "fr":="Titre : Erreur fréquente lors de l’utilisation du modèle Claude sur le site public d’aide au développement Black and White – Demande d’assistance", "it":="Titolo: Errori frequenti nell’utilizzo del modello Claude sul sito di pubblica assistenza Black and White – Richiesta di aiuto"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Login to your account |
致命
[准确性]
"Login to your account"
理由: Complete mistranslation - completely unrelated to source content. Source is about choosing a car, hypothesis is about login. | Completely unrelated to the source content about asking friends to help choose a car; full mistranslation. | Completely incorrect translation; source is about choosing a car, not a login prompt | Entirely incorrect translation of the source content | Complete hallucination; the output has no relation to the source text which asks for help choosing a car. |
| CohereLabs/tiny-aya-water | Title: Let's pick some cars, guys |
严重
[准确性]
"Let's pick some cars, guys"
理由: Mistranslation: '佬友们' is slang for 'bros/buddies', not 'guys'; missing context about choosing from specific cars with 360-degree camera feature | Incomplete translation. Missing the key phrase '佬友们' (bros/guys) and the imperative 'come in/help'. Should be 'Bros, come in and help me choose a car' | Loses the nuance of asking ‘bros’ to come in and help choose among specific options, but core meaning is largely preserved. |
| Qwen/Qwen2.5-14B-Instruct | Title: Friends, come help me choose a car Content: Friends, come take a look at which one of these cars is better. I found a few that are cheaper and have 360-degree cameras. If I must choose from these options, which one should I pick and why? |
轻微
[风格]
"Friends"
理由: Should be 'Bros' to better capture the informal tone of '佬友们' | Less colloquial than 'Bros' | Loss of register; '佬友们' is informal slang better rendered as 'Bros' or similar, whereas 'Friends' is too neutral. | Translation style: 'Friends' vs 'Bros' - less colloquial than reference |
| Qwen/Qwen2.5-7B-Instruct | Old friends, come help pick a car |
严重
[准确性]
"Old friends"
理由: Incorrect translation of '佬友们' - should be 'Bros' or similar informal address, not 'Old friends'. Also missing 'Title:' label and content translation. | Missing context and details from the source | Incomplete translation. Only translates the title, missing the entire content. Reference provides full translation of both title and content. | “佬友们” refers to forum bros/guys, not ‘old friends’; loses colloquial tone and social nuance. | Missing the content part of the source text entirely. The hypothesis only translates the title. |
| Qwen/Qwen3-14B | Fratelli, aiutatemi a scegliere l'auto |
严重
[准确性]
"Fratelli, aiutatemi a scegliere l'auto"
理由: Severe omission: The entire 'Content' section detailing the car selection criteria is missing. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Only renders a short title-like sentence and omits all details about several cheap cars with 360 cameras and the question of which to choose and why. | Content is missing. The hypothesis only translates the Title, but the Content contains important information about 'cheap ones with 360-degree camera' and 'which one and why' that is completely omitted. |
| Qwen/Qwen3.5-4B | Folks, come in and help pick a car. |
严重
[准确性]
"Folks, come in and help pick a car."
理由: Omits key information about cheap cars, 360-degree camera, constrained choice, and asking for reasons, losing important content. | Hypothesis only translates the title but omits the content entirely. The source has substantial content that should be translated | Missing Title field - source has '佬友们,进来帮忙选选车' but hypothesis only has content translation. Also 'Folks' is too generic; reference uses 'Bros' which captures the slang '佬友'. | Incomplete translation; omitted the entire content section regarding specific car features (360 camera) and the question. |
| Qwen/Qwen3.5-9B | Title: Friends, come help me choose a car Content: Friends, take a look at these few cars and tell me which one is better. I found several cheap ones with 360 cameras. If I have to pick from these, which one should I choose and why? |
轻微
[风格]
"Friends"
理由: Failed to capture the specific community slang tone of '佬友们' (Bros/Old hands), using a generic term instead. | Should be 'Bros' for colloquial tone | “佬友们/佬友” is colloquial/slang similar to “bros” or “dudes”; rendering as neutral “friends” loses the slang tone. | Less accurate slang translation. Source uses '佬友们' which is internet slang for 'bros' or 'dudes', not just 'Friends'. Reference correctly translates as 'Bros' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Friends, come in and help choose a car. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Hey folks, come in and help me choose a car. |
轻微
[准确性]
"Hey folks, come in and help me choose a car."
理由: Only title-level gist is translated; full content about multiple car options, 360 camera, constraints, and asking reasons is omitted. | Hypothesis only translates the title, missing the entire content. The source has both title and content that should be translated | Omission of the entire 'Content' section of the source text. | Missing Content translation entirely; only Title was translated |
| google/gemma-3-1b-it | Friends, welcome to look at these cars and see which ones are good. |
严重
[准确性]
"Friends, welcome to look at these cars and see which ones are good."
理由: Incomplete translation - missing details about 'cheap ones with 360-degree camera', 'which one should I pick, and why?' | Severe omission. Missed the specific constraints: 'cheap', 'with 360 camera', and the core question 'which one to pick and why'. | Missing translation of '哪个比较好'. | Only partially translates the content; omits the constraints about cheap cars with 360 camera, having to choose among several, and asking which to pick and why. | Incomplete translation. Missing the main content about choosing from cheap cars with 360-degree cameras and asking for reasons why |
| google/gemma-3-4b-it | Hey buddies, come in and help pick a car! |
严重
[准确性]
"Hey buddies, come in and help pick a car!"
理由: The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). The content section is missing. | Only covers the title idea; omits the details about several cheap cars with 360 cameras, the constraint to choose among them, and asking for reasons. | Incomplete translation - only translates title, missing the content. 'Hey buddies' is less formal than 'Bros'. Missing the specific context about 360-degree camera and decision-making aspect. |
| google/translategemma-12b-it | Hey everyone, come and help me choose a car. |
严重
[准确性]
"Hey everyone, come and help me choose a car."
理由: Omission of the detailed 'Content' section regarding car features (360 camera, price) and the specific question. | Incomplete translation. Missing the content about specific car models and 360-degree camera feature. Only translates the title, not the full source. | Title only translated; full content about multiple cheap cars with 360 camera, forced choice and asking why is entirely omitted. |
| google/translategemma-4b-it | Hey everyone, come help choose a car! |
严重
[准确性]
"Hey everyone, come help choose a car!"
理由: Misunderstanding of source - '佬友' is slang for 'bro/friend' in the community, not 'everyone'; '选车' refers to choosing from specific car options with 360 cameras, not general car choosing | Incomplete translation. Missing content section. '佬友们' is colloquial slang better translated as 'Bros' not 'Hey everyone'. Should include full content about car models and 360-degree camera. | The hypothesis translates only the Title and omits the entire Content. It also fails to capture the specific slang '佬友们' (Bros/Veterans), using a generic 'Hey everyone'. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: The hypothesis does not match the source or reference content. | Hypothesis is 'Login' which has no relation to the source content about choosing a car. This is completely incorrect. | Completely unrelated to the source; source is about asking bros to help choose a car, but hypothesis is a single unrelated word. | Hypothesis is completely incorrect - 'Login' has no relation to the source text about choosing a car. | The hypothesis is completely unrelated to the source text. It fails to translate the title or content regarding choosing a car. |
| tencent/HY-MT1.5-7B | Ladies and gentlemen, come in and help us choose a car. | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Drawing】Draw one and there will be 50 days left until the due date of the annual payment airport (I am out of my depth, I want points) |
严重
[准确性]
"I am out of my depth, I want points"
理由: Mistranslation of “我摊牌了,我想要积分”, which means ‘I’ll be honest / I’m laying my cards on the table, I want points’, not ‘out of my depth’. | '我摊牌了' means 'I'll lay my cards on the table/be honest', not 'I am out of my depth' | Incorrect translation of '我摊牌了'; should be 'I'll lay my cards on the table' or 'I'll be honest' | Mistranslation of '机场' (slang for proxy/VPN service) as literal 'airport'. '摊牌了' (lay cards on the table) mistranslated as 'out of my depth'. Content is missing. |
| CohereLabs/tiny-aya-water | Title: A raffle! Draw one and there will be 50 days left until the end of the year subscription for airport (I have folded). |
严重
[准确性]
"Draw one and there will be 50 days left until the end of the year subscription for airport"
理由: Misinterprets ‘年付机场’ as ‘end of the year subscription’ instead of ‘an annual subscription’ from a proxy/VPN provider. | Incomplete translation. '年付机场' means 'annual proxy provider subscription', not just 'year subscription for airport' | Mistranslation: '机场' refers to 'proxy provider/service', not literal 'airport'; '我摊牌了' means 'I'll lay my cards on the table/be honest', not 'I have folded'; '抽奖' is 'lucky draw/giveaway', not just 'raffle'
严重
[准确性]
"(I have folded)."
理由: Mistranslation of ‘我摊牌了,我想要积分’, which means openly admitting wanting forum points, not ‘I have folded’ in a card-game sense. | Literal mistranslation of slang '摊牌' (lay cards on the table/be honest). | Incorrect translation of '我摊牌了'. Should be 'I'll lay my cards on the table' (meaning being honest about intentions), not 'I have folded' |
| Qwen/Qwen2.5-14B-Instruct | Title: [Draw] Draw one annual payment proxy provider with 50 days left (I'm putting all my cards on the table, I want points) Content: At 16:00 24 hours later (March 10th), use the draw tool within the site to start drawing! One lucky person will win! |
轻微
[术语]
"proxy provider"
理由: Should be 'annual proxy provider subscription' - more natural phrasing | Literal rendering of ‘机场’; acceptable but less natural than ‘proxy provider subscription’ or similar per reference. | Inconsistent: 'annual payment' vs 'annual subscription' in reference |
| Qwen/Qwen2.5-7B-Instruct | Title: [Draw] Pick one; there's 50 days left until expiration for annual proxy provider (I admit it, I want points) Content: 24 hours later (March 10th) at 16:00, use the station's lottery tool to draw and one lucky winner will be chosen! |
轻微
[风格]
"I admit it, I want points"
理由: Slightly different phrasing from reference but meaning is preserved. 'proxy provider' vs 'subscription'. | '摊牌了' is an idiom meaning 'lay my cards on the table' or 'come clean'. 'I admit it' is slightly weaker but acceptable. Reference uses 'lay my cards on the table'. | Should be 'I'll lay my cards on the table, I just want forum points' for more natural English expression. |
| Qwen/Qwen3-14B | 【Sorteggio】Estrai un proxy annuale con 50 giorni rimanenti (Sono sincero, voglio punti) |
严重
[准确性]
"Estrai un proxy annuale con 50 giorni rimanenti"
理由: Omission of the 'Content' section containing the draw time and mechanism. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Misinterprets as “draw a proxy” instead of giving away / lottery for an annual proxy subscription with 50 days left. |
| Qwen/Qwen3.5-4B | Title: [Giveaway] One lucky winner will be chosen from annual subscriptions expiring in 50 days (I'm revealing my hand, I want points). Content: At 16:00 on March 10th (after 24 hours), use our site's giveaway tool to participate. A lucky winner will be selected! |
轻微
[准确性]
"use our site's giveaway tool to participate"
理由: Slight nuance change: tool is used to perform the draw, not explicitly for user participation; impact is small. | Shifted agency. The source says the author will use the tool to draw, not that users should use it to participate. | Incorrect translation: 'use our site's giveaway tool to participate' - source says '使用站里的抽奖工具开抽' meaning 'use the lottery tool to draw', not to participate. Reference correctly translates as 'I will use the forum's lottery tool to draw'. |
| Qwen/Qwen3.5-9B | Title: [Giveaway] One annual proxy with 50 days remaining (I'm showing my hand, I want points). Content: At 16:00 on March 10th (after 24 hours), use the site's lottery tool to draw, and one lucky winner will be selected! | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Sweepstakes] Win a still valid 1-year proxy subscription (I'm revealing now, I want points) |
严重
[准确性]
"Win a still valid 1-year proxy subscription"
理由: Source indicates a remaining 50 days on an annual subscription, not a full valid 1-year term. | Misinterprets '还有 50 天到期的年付机场' (annual subscription with 50 days remaining). The hypothesis implies a full year validity, whereas the source specifies only 50 days are left on an annual plan. | Inaccurate translation: '还有50天到期的年付机场' means 'annual proxy subscription with 50 days left', not 'still valid 1-year'. |
| google/gemma-3-12b-it | Title: [Giveaway] Giveaway an annual proxy provider (expires in 50 days) (I admit it, I want points!) Content: In 24 hours (March 10th) at 16:00, use the site’s lottery tool to start the draw. One lucky winner will be selected! |
轻微
[风格]
"I admit it, I want points!"
理由: Less idiomatic than 'I'll lay my cards on the table' | 'I admit it' is less idiomatic for '摊牌了' in this context than 'I'll lay my cards on the table'. | Should be 'I'll lay my cards on the table, I just want forum points' for more natural English and context |
| google/gemma-3-1b-it | Draw a prize - one with a 50-day expiration date for an airport (I'm advertising, I want points). |
严重
[准确性]
"airport"
理由: Misinterpreted '机场' (slang for proxy provider) as literal airport. Missed 'annual payment' (年付). Phrasing is incoherent. | "机场" here refers to a proxy/VPN provider, not a literal airport; mistranslation of technical slang. | Incorrect translation - '机场' (proxy provider) translated as 'airport', '抽奖' (lucky draw) translated as 'draw a prize', '年付' (annual subscription) not translated, '50天到期' (50 days left) translated as '50-day expiration date'
严重
[准确性]
"I'm advertising, I want points"
理由: Incorrect translation of '我摊牌了,我想要积分'. Should be 'I'll lay my cards on the table, I just want forum points' not 'I'm advertising' | Missing specification of '积分' as 'forum points'. | Mistranslation of '摊牌了' (lay cards on the table/be honest). 'Advertising' changes the meaning entirely. |
| google/gemma-3-4b-it | Title: [Sweepstakes] Draw one with 50 days remaining until the due date for an annual installment proxy provider (I’m at my wit’s end, I want points). Content: Drawing will take place on March 10th at 4:00 PM within 24 hours using the lottery tool on the site! A lucky winner will be drawn! |
严重
[准确性]
"annual installment proxy provider"
理由: The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Annual installment' misinterprets '年付' (annual payment); 'at my wit's end' misinterprets '摊牌' (lay cards on the table/be honest). | "年付机场" means annual subscription proxy provider, not installment. | Incorrect translation: 'annual installment' should be 'annual subscription' | 'annual installment proxy provider' is inaccurate - should be 'annual proxy provider subscription'. 'I'm at my wit's end' is incorrect idiom - should be 'I'll lay my cards on the table'. The phrase should convey the user wants forum points, not that they're frustrated. |
| google/translategemma-12b-it | Title: [Giveaway] One year proxy subscription with 50 days remaining (I admit it, I want points!) Content: In 24 hours (March 10th) at 16:00, we will use the website's giveaway tool to conduct the draw. One lucky winner will be selected! | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Giveaway] Draw one with a year-long subscription expiring in 50 days (I'm out of options, I want points) |
严重
[准确性]
"I'm out of options, I want points"
理由: Incorrect translation of '我摊牌了'. Should be 'I'll lay my cards on the table' or 'I'm being honest'. Also missing content section entirely. | Incorrect translation - '我摊牌了' means 'I'll lay my cards on the table/be honest', not 'I'm out of options'; '机场' means 'proxy provider', not literal airport | The hypothesis translates only the Title and omits the Content. The translation of '我摊牌了' as 'I'm out of options' is inaccurate; it means 'I'll lay my cards on the table' or 'I admit it'. '机场' is contextually a proxy service, not literally an airport, though 'subscription' hints at it, the omission of content is the major failure. | Source ‘我摊牌了,我想要积分’ means ‘I’ll be honest, I want points’; ‘out of options’ adds unsupported nuance. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 【抽奖】抽一个还有50天到期的年付机场(我摊牌了,我想要积分)' in en |
致命
[准确性]
"Translation of 'Title: 【抽奖】抽一个还有50天到期的年付机场(我摊牌了,我想要积分)' in en"
理由: Meta-text describing a translation request, not an actual translation of the title. Entire content is missing. | The hypothesis is a meta-prompt describing a translation task rather than providing the actual translation. It also omits the content entirely. | Hypothesis is a placeholder, not an actual translation. Content translation is also missing. |
| tencent/HY-MT1.5-7B | "en": "Title: [Lottery] Draw a yearly airport subscription that is still valid for 50 days. (I’m admitting it, I want those points!)", "zh-TW": "标题:[抽奖] 抽中一份还剩50天有效期的年度机场订阅服务。(我承认了,我想要这些积分!)", "ja": "タイトル:[抽選会] 50日間有効な年間アエロポートサブスクリプションを抽けます。(白状しますが、ポイントが欲しいんです!)", "ko": "제목:[추첨회] 50일 남은 연간 공항 서브스크립션을 받아가세요!(솔직히 말하자면, 포인트가 필요합니다!)", "de":="Titel: [Ziehung] Gewinnen Sie eine jährliche Airport-Abonnementdienstleistung, die noch 50 Tage gültig ist. (Ehrlich gesagt, ich möchte diese Punkte!)", "fr":="Titre : [Loterie] Gagnez une abonnement annuelle à l’aéroport qui est encore valable pendant 50 jours. (Pour être honnête, je veux ces points !)", "it":="Titolo: [Lotteria] Vinca un abbonamento annuale all’aeroporto che è ancora valido per 50 giorni. (Onestamente, voglio questi punti!)"}|im_end|> |
严重
[准确性]
"yearly airport subscription"
理由: “机场”在语境中指代理/VPN 服务,不是实体“机场”,应为“机场订阅/代理服务订阅”。 | In this context, '机场' (airport) is slang for a 'proxy provider' or 'SSR/V2Ray service'. Translating it literally as 'airport subscription' is factually incorrect regarding the service being offered. | Should be 'annual proxy provider subscription' not 'yearly airport subscription'. '机场' is slang for proxy provider. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Byte Release Crab: ArkClaw |
严重
[准确性]
"Crab"
理由: Mistranslation of '龙虾' (slang for OpenClaw) as literal 'Crab'. '字节' should be 'ByteDance'. Content is missing. | Mistranslation of “龙虾” (lobster / OpenClaw metaphor); ‘Crab’ is incorrect. | Missing specific term 'ByteDance' and incorrect translation of '龙虾' as 'Crab' instead of 'OpenClaw' | '字节发布龙虾' - '龙虾' is nickname for OpenClaw, should be 'ByteDance released OpenClaw' or 'ByteDance released their version of OpenClaw: ArkClaw'. 'Release' should be 'released' and 'Crab' is wrong for '龙虾' (OpenClaw software). | Source says '龙虾' (lobster) which should be 'OpenClaw' based on context, not 'Crab' |
| CohereLabs/tiny-aya-water | Title: Byte Release Crab: ArkClaw |
轻微
[准确性]
"Byte Release Crab: ArkClaw"
理由: Brand name and key term mistranslated: should be ‘ByteDance released Dragon Lobster/OpenClaw: ArkClaw’ or similar; ‘Crab’ and omission of ‘Dance’ are misleading. | Mistranslation: '字节' refers to 'ByteDance' (company name), not 'Byte Release'; '龙虾' is slang for 'OpenClaw', not literal 'Crab' | Mistranslation of '龙虾' as 'Crab'. Should be 'OpenClaw'. Also missing context that this is ByteDance's version. Should be 'ByteDance released their version of OpenClaw: ArkClaw' | Mistranslated '字节' (ByteDance) as 'Byte' and '龙虾' (OpenClaw slang) as literal 'Crab'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Byte Releases Lobster: ArkClaw Content: Byte Releases Lobster: ArkClaw |
严重
[准确性]
"Byte"
理由: Should be 'ByteDance released their version of OpenClaw: ArkClaw' - missing context that this is ByteDance's version of OpenClaw | Source refers to ‘字节’, meaning the company ByteDance; truncating to ‘Byte’ changes the entity. | Mistranslation of slang '龙虾' referring to 'OpenClaw'. | Incorrect: 'Lobster' should be 'their version of OpenClaw' - '龙虾' is a nickname for OpenClaw, not literal lobster |
| Qwen/Qwen2.5-7B-Instruct | Byte releases ArkClaw: |
严重
[准确性]
"Byte releases ArkClaw:"
理由: Missing context. Should be 'ByteDance released their version of OpenClaw: ArkClaw' to clarify the relationship between ArkClaw and OpenClaw. | Missing 'Title:' label and incomplete translation. The reference shows 'ByteDance released their version of OpenClaw: ArkClaw' but hypothesis only has 'Byte releases ArkClaw:' - missing context about 'their version of OpenClaw'. | Omitted “龙虾” and that ArkClaw is ByteDance’s OpenClaw variant; should convey “ByteDance released their version of OpenClaw: ArkClaw” for full meaning. | Incomplete translation. Misses '龙虾' (OpenClaw) comparison and the full structure. '字节' should be 'ByteDance'. |
| Qwen/Qwen3-14B | Title: Byte Release Lobster: ArkClaw Content: Byte Release Lobster: ArkClaw |
严重
[准确性]
"Byte"
理由: Mistranslation: '字节发布' means 'ByteDance released', not 'Byte Release'. Also '龙虾' refers to 'OpenClaw' (the software), not literal 'lobster'. Should be 'ByteDance ha rilasciato ArkClaw'. | Source refers to 字节 (ByteDance company), but translation uses generic “Byte”, losing the specific company reference. | Source says '字节' (ByteDance), hypothesis says 'Byte'. Also, '龙虾' is a slang term for OpenClaw, not literal 'Lobster'. Should clarify this is ByteDance's version of OpenClaw. | Hallucination/Language Error: The hypothesis is in English, but the target language is Chinese (zh). It also mistranslates '字节' as 'Byte' instead of 'ByteDance' and treats 'Lobster' literally instead of as the codename. |
| Qwen/Qwen3.5-4B | ByteDance Releases Lobster: ArkClaw |
严重
[准确性]
"Lobster: ArkClaw"
理由: 'Lobster' is incorrect - '龙虾' in source is slang for OpenClaw, not literal lobster. Should be 'their version of OpenClaw' as in reference. | “龙虾” is a playful reference to OpenClaw; hypothesis misses that this is ByteDance’s OpenClaw-like product, losing that nuance. | Mistranslation of slang '龙虾' referring to 'OpenClaw' software. |
| Qwen/Qwen3.5-9B | Title: ByteDance Releases ArkClaw: The Lobster Content: ByteDance Releases ArkClaw: The Lobster |
严重
[准确性]
"The Lobster"
理由: Adds an appositive “The Lobster”, implying ArkClaw itself is a lobster, while source only says “发布龙虾:ArkClaw”, i.e. ByteDance’s OpenClaw-like project; this alters the nuance and introduces unsupported content. | Inaccurate context. The source title doesn't indicate this is ByteDance's version of OpenClaw. Reference correctly contextualizes it as 'ByteDance released their version of OpenClaw: ArkClaw' | Incorrect translation: 'The Lobster' is added; reference says 'ByteDance released their version of OpenClaw: ArkClaw' - the term 'lobster' is a slang for OpenClaw and should not be translated literally | Mistranslation of slang '龙虾' as the animal instead of recognizing it as a nickname for the software project (OpenClaw). |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | ByteDance releases ArkClaw: shrimp. |
致命
[准确性]
"ByteDance releases ArkClaw: shrimp."
理由: Critical mistranslation: '龙虾' in this context refers to 'OpenClaw' (a software project), not literal shrimp. The reference correctly uses 'OpenClaw'. This is a terminology error - '龙虾' is a slang term for OpenClaw in the Chinese tech community. | Word order is wrong and '龙虾' (shrimp/lobster) should not appear separately. Should be 'ByteDance releases their version of OpenClaw: ArkClaw' or similar | Inverts the relationship: should be ‘ByteDance releases Lobster: ArkClaw’ or ‘ByteDance releases ArkClaw (Lobster)’. Also mistranslates 龙虾 as ‘shrimp’ and adds a confusing trailing ‘shrimp.’ | Mistranslation of '龙虾' (Lobster). In this context, it refers to the software 'OpenClaw'. Translating literally as 'shrimp' (which is also biologically incorrect for 'lobster') destroys the meaning. |
| google/gemma-3-12b-it | Title: Byte releases Lobster: ArkClaw Content: Byte releases Lobster: ArkClaw |
严重
[准确性]
"Byte releases Lobster: ArkClaw"
理由: Mistranslation: 'Lobster' should be 'their version of OpenClaw' - the Chinese '龙虾' is a nickname/slang for OpenClaw, not a literal translation | Mistranslation of slang '龙虾' which refers to the software 'OpenClaw'. | Source emphasizes ByteDance and product name; “Byte” is an incomplete rendering of ByteDance and misses the OpenClaw-related nuance given in reference.
严重
[术语]
"Lobster"
理由: Incorrect translation of '字节' as 'Byte' instead of 'ByteDance' | Should be 'OpenClaw' not 'Lobster'. The source uses '龙虾' as slang for OpenClaw | '字节' in this context refers to the company 'ByteDance', not the unit of data 'Byte'. |
| google/gemma-3-1b-it | Byte Release Shrimp: ArkClaw |
严重
[准确性]
"Byte Release Shrimp"
理由: Incomplete translation. Missing context that this is ByteDance's version of OpenClaw. Should be 'ByteDance released their version of OpenClaw: ArkClaw' | Mistranslation of '字节' (ByteDance) and '龙虾' (OpenClaw codename). 'Shrimp' is incorrect in this tech context. | Mistranslation of "字节发布龙虾"; should be ByteDance released Lobster/OpenClaw; shrimp is incorrect and company name shortened incorrectly. | Incorrect translation - '字节' (ByteDance) translated as 'Byte', '龙虾' (shrimp/lobster) should be 'OpenClaw' not 'Shrimp' | Incorrect translation of '字节发布龙虾' which should be 'ByteDance released their version of OpenClaw'. |
| google/gemma-3-4b-it | Byte Releases Lobster: ArkClaw |
严重
[准确性]
"Byte Releases Lobster: ArkClaw"
理由: The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Lobster' is a literal translation of the slang '龙虾' which refers to 'OpenClaw' in this context, causing confusion. | Missing critical context - should specify 'ByteDance released their version of OpenClaw' not just 'Byte Releases Lobster'. The reference shows this is about ByteDance's version of OpenClaw, not a standalone product. | Incomplete: 'Byte' should be 'ByteDance' (full company name) | Treats "龙虾" literally as lobster; in context it refers to OpenClaw or a claw-themed model from ByteDance. |
| google/translategemma-12b-it | Title: Byte releases ArkClaw (Lobster) |
严重
[准确性]
"Byte releases ArkClaw (Lobster)"
理由: Adds "(Lobster)" which is not present in the source and may imply an explanatory nickname not given. | Missing critical context. Reference states 'ByteDance released their version of OpenClaw: ArkClaw' which clarifies the relationship between ArkClaw and OpenClaw. | Mistranslation of '龙虾' (slang for ArkClaw/OpenClaw) as the animal. | Incorrect: 'Byte releases' should be 'ByteDance released' (company name). Also 'Lobster' in parentheses is a literal translation of 龙虾 but the reference uses 'their version of OpenClaw'. |
| google/translategemma-4b-it | Title: Byte publishes Lobster: ArkClaw Content: Byte publishes Lobster: ArkClaw |
严重
[准确性]
"Byte"
理由: The hypothesis translates '龙虾' (Lobster) literally, failing to recognize it as the slang term for 'OpenClaw' used in this context. It also omits the repetition in the source which acts as Content, though the main issue is the entity translation. | Source text '字节发布龙虾' should be translated as 'ByteDance released their version of OpenClaw' not 'Byte publishes Lobster'. The hypothesis misses the context that this refers to ByteDance's product release and incorrectly translates '龙虾' as 'Lobster' instead of recognizing it as a codename/product name equivalent to 'OpenClaw'. | Incorrect translation - '龙虾' is slang for OpenClaw, not literal lobster; should be 'ByteDance released their version of OpenClaw: ArkClaw' | ‘字节’ refers to ByteDance, not ‘Byte’; company name mistranslated. |
| tencent/HY-MT1.5-1.8B | The title 'Byte Release Lobster: ArkClaw' refers to a project or activity related to byte releases and lobsters. |
严重
[术语]
"Byte Release Lobster"
理由: Failed to recognize '龙虾' as the slang term for 'OpenClaw' (or similar project) and '字节' as 'ByteDance'. Translated literally as 'Lobster'. | Incorrect translation of '龙虾' as 'lobster' instead of 'OpenClaw'. | Mistranslates 字节 (ByteDance) and 龙虾 (OpenClaw slang); should refer to ByteDance and OpenClaw, not literal bytes and lobsters.
严重
[准确性]
"refers to a project or activity related to byte releases and lobsters"
理由: Hallucinated explanation; the source just states the release of ArkClaw as 龙虾, no description of a project or activity. | Mistranslation: '龙虾' should be 'OpenClaw' (the project name), not literal 'lobster'. 'Byte Release' should be 'ByteDance released'. | Source uses '龙虾' which is slang for 'OpenClaw', not literal 'lobster'. The hypothesis translates it literally instead of recognizing the technical term. | The hypothesis is an hallucinated explanation rather than a translation of the source text. |
| tencent/HY-MT1.5-7B | Byte Publishing Lobster: ArkClaw |
致命
[准确性]
"Byte Publishing Lobster"
理由: The hypothesis translates '字节' as 'Byte Publishing' (incorrect entity) and '龙虾' literally as 'Lobster'. The reference correctly identifies 'ByteDance' and the product 'OpenClaw'. The translation completely fails to identify the real-world entities. | “字节发布”应是“字节跳动发布/ByteDance 发布”,“龙虾”是对 OpenClaw 系列的昵称,参考中也需体现与 OpenClaw 的关联,本译文既误译主体又未体现 OpenClaw 关联。 | Should be 'ByteDance released their version of OpenClaw: ArkClaw'. Missing 'ByteDance' and 'their version of OpenClaw' context. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: If phones can raise crabs |
严重
[准确性]
"crabs"
理由: '小龙虾' is slang for OpenClaw (AI agent software), not literal small crab. Should be 'If smartphones could run OpenClaw'. | Mistranslation of “龙虾” which in this context refers to running OpenClaw instances, not literal crabs. | Source says '龙虾' which should be 'OpenClaw' based on context, not 'crabs' | Mistranslation of '养龙虾' (host/run OpenClaw) as literal 'raise crabs'. Content is missing. |
| CohereLabs/tiny-aya-water | Title: If phones could breed crabs |
严重
[准确性]
"breed crabs"
理由: Mistranslation of '养龙虾'. '龙虾' is slang for OpenClaw AI model, not literal crabs. Should be 'run OpenClaw' | Core concept mistranslated: ‘养龙虾’ here refers to running/hosting OpenClaw, not breeding crabs. | The translation of '龙虾' as 'crabs' is incorrect. It should be 'OpenClaw' as in the reference. | Mistranslated slang '养龙虾' (host/run OpenClaw) as literal 'breed crabs'. | Mistranslation: '小龙虾' is slang for 'OpenClaw' (AI agent software), not literal 'crab'; '养' means 'host/run' (software), not 'breed' |
| Qwen/Qwen2.5-14B-Instruct | Title: If phones could raise lobsters Content: Then old phones could all be put to use, and there would also be visual capabilities. |
严重
[准确性]
"raise lobsters"
理由: In this forum context ‘养龙虾’ refers to running/hosting OpenClaw, not literally raising lobsters. | Should be 'If smartphones could run OpenClaw' - 'lobster' is slang for OpenClaw and should be translated as such | Literal mistranslation of slang '养龙虾' which means running/hosting OpenClaw. | Incorrect: 'raise lobsters' should be 'run OpenClaw' - '养龙虾' is a metaphor for hosting OpenClaw
轻微
[流畅性]
"there would also be visual capabilities"
理由: Slightly awkward phrasing; 'vision capabilities' is more natural in this tech context. | Slightly different from reference: 'there would also be visual capabilities' vs 'they even have vision capabilities' | Should be 'they even have vision capabilities' - more natural phrasing |
| Qwen/Qwen2.5-7B-Instruct | Title: If phones could keep lobsters Content: Then old phones could all be put to use, and they still have visual capabilities. |
严重
[准确性]
"keep lobsters"
理由: Incorrect translation of '龙虾' - should be 'OpenClaw' (a software), not 'lobsters' (food). The source uses '龙虾' as slang for OpenClaw. | Should be 'If smartphones could run OpenClaw' not 'keep lobsters'. The slang '养龙虾' means running/hosting OpenClaw, not keeping lobsters. | Mistranslation of slang '养龙虾' (raise/host OpenClaw). Translating as keeping actual lobsters changes the meaning entirely. | “养龙虾” is community slang for running/hosting OpenClaw; literal ‘keep lobsters’ misrepresents intended meaning. |
| Qwen/Qwen3-14B | Se i telefonini potessero allevare gamberi |
严重
[准确性]
"Se i telefonini potessero allevare gamberi"
理由: Mistranslation of slang '养龙虾' (host/run OpenClaw) as 'raise shrimp/crustaceans'. Also omits the 'Content' section. | Interprets “养龙虾” literally as breeding lobsters/shrimps instead of the OpenClaw software metaphor. | Mistranslation: '龙虾' refers to OpenClaw (software), not literal 'gamberi' (shrimp/prawns). Should be 'Se gli smartphone potessero eseguire OpenClaw'. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. |
| Qwen/Qwen3.5-4B | If phones can raise crayfish, then old phones can be fully utilized, along with their visual capabilities. |
严重
[准确性]
"raise crayfish"
理由: Incorrect translation: 'crayfish' should be 'OpenClaw' (the software). The source '小龙虾' is a slang term for OpenClaw, not literal crayfish. Reference correctly uses 'OpenClaw'. | Literal translation of slang '养龙虾'. In this context, it means running/hosting the 'OpenClaw' software. | Metaphorical “养龙虾” refers to running/hosting OpenClaw, not literally raising crayfish; mistranslation of intended meaning. |
| Qwen/Qwen3.5-9B | Title: If phones could raise lobsters Content: Then old phones could all be put to use, along with their visual capabilities. |
严重
[准确性]
"raise lobsters"
理由: Incorrect translation: 'lobsters' should be 'OpenClaw' - 'lobster' is a slang term for OpenClaw, not a literal lobster | Incorrect terminology. '养龙虾' in this context refers to running/hosting OpenClaw, not raising lobsters. Reference correctly translates as 'run OpenClaw' | Literal “养龙虾/小龙虾” refers to running/hosting OpenClaw in this community context; translating as real lobsters changes the meaning. | Literal translation of slang '养龙虾' (hosting/running OpenClaw) interpreted as animal husbandry. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | If a mobile phone can raise shrimp |
致命
[准确性]
"raise shrimp"
理由: Critical mistranslation: '小龙虾' in this context refers to 'OpenClaw' (a software project), not literal small shrimp. The reference correctly uses 'OpenClaw'. This is a terminology error - '小龙虾' is a slang term for OpenClaw in the Chinese tech community. | Misinterprets “养龙虾” which in this context means running/hosting OpenClaw, not literally raising shrimp or lobsters. | Mistranslation of slang '养龙虾' (raising lobsters), which means hosting/running the 'OpenClaw' software. Literal translation makes no sense in context. | Misses the context that '龙虾' refers to OpenClaw (an AI model), not literal shrimp. Should be 'If smartphones could run OpenClaw' |
| google/gemma-3-12b-it | Title: If phones can raise lobsters Content: Then old phones can be fully utilized, and still have visual capabilities. |
严重
[准确性]
"If phones can raise lobsters"
理由: Literal mistranslation of slang '养龙虾'. In this context, it means running/hosting the 'OpenClaw' software. | Mistranslation: 'raise lobsters' should be 'run OpenClaw' - the Chinese '养龙虾' is slang for hosting OpenClaw, not literal | Source is metaphor for running OpenClaw; translation interprets literally as raising animals, missing OpenClaw/product implication. |
| google/gemma-3-1b-it | If the smartphone can raise lobsters, then old smartphones can be used. |
严重
[准确性]
"If the smartphone can raise lobsters"
理由: Incorrect translation of '养龙虾' which should be 'run OpenClaw'. | "养龙虾" here is metaphor for hosting/running OpenClaw, not literally raising lobsters. | Should be 'run OpenClaw' not 'raise lobsters'. '龙虾' is slang for OpenClaw | Incorrect translation - '小龙虾' (OpenClaw) translated literally as 'lobsters', missing '视觉能力' (vision capabilities) | Literal translation of slang '养龙虾' (host/run OpenClaw). The meaning is completely lost. |
| google/gemma-3-4b-it | If your phone can raise lobsters, Then old phones can all be utilized, and have visual capabilities. |
严重
[准确性]
"raise lobsters"
理由: Misinterprets slang; "养龙虾" refers to running/hosting OpenClaw, not literal lobsters. | The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Raise lobsters' is a literal mistranslation of the slang '养龙虾' (host/run OpenClaw). | 'raise lobsters' should be 'run OpenClaw'. 'have visual capabilities' should be 'they even have vision capabilities'. The metaphor about 'raising' is not accurate to the source context. |
| google/translategemma-12b-it | Title: If you can raise lobsters on your phone... |
严重
[准确性]
"If you can raise lobsters on your phone..."
理由: Complete mistranslation of the slang metaphor '养龙虾' (running OpenClaw) as literally raising animals. | Misses implied reference to running OpenClaw; also omits the content about reusing old phones and vision capability. | Incorrect translation. Should be 'If smartphones could run OpenClaw' not 'raise lobsters'. The metaphor is about running software, not raising animals. | Incorrect translation: 'lobsters' should be 'OpenClaw' (the software). The source uses 龙虾 (lobster) as slang for OpenClaw, not literal lobsters. |
| google/translategemma-4b-it | Title: If Smartphones Can Raise Lobsters |
严重
[准确性]
"If Smartphones Can Raise Lobsters"
理由: In context ‘养龙虾’ is metaphor for running/hosting OpenClaw; translation treats it as literal lobsters and ignores OpenClaw and follow-up content. | The hypothesis translates only the Title and omits the Content. It translates '养龙虾' literally as 'Raise Lobsters' instead of the intended meaning 'run OpenClaw'. | Incomplete translation - only title provided, missing content. '龙虾' in this context refers to 'OpenClaw' not literal lobsters. Should be 'If smartphones could run OpenClaw'. | Incorrect translation - '养龙虾' is slang for 'hosting OpenClaw', not literal lobster raising; '旧手机' means 'old phones', not 'Smartphones' generically | Should be 'run OpenClaw' to match the reference. |
| tencent/HY-MT1.5-1.8B | If mobile phones can raise lobsters |
严重
[术语]
"If mobile phones can raise lobsters"
理由: Incorrect translation of '龙虾' as 'lobsters' instead of 'OpenClaw'. | Failed to translate the slang '养龙虾' (host/run OpenClaw) literally translating it as 'raise lobsters'. Missed the context of running software on old phones. | Literal translation of 养龙虾; in context '龙虾' refers to OpenClaw, not actual lobsters.
严重
[准确性]
"raise lobsters"
理由: Mistranslation: '龙虾' should be 'OpenClaw' (the project name), not literal 'lobsters'. The source is about running OpenClaw on smartphones. | Source uses '养龙虾' which is slang for running/hosting 'OpenClaw', not literal 'raise lobsters'. The hypothesis translates it literally. | Failed to translate the content sentence entirely. |
| tencent/HY-MT1.5-7B | Title: If Mobile Phones Could Raise Lobsters | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Many people ask about what is different with Dragon Ball Z Proxy? Dragon Ball Z Proxy is not just a regular node service, we have added many special features and optimized experience... | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Many people ask about what is different with Dragon Ball Z Airport? Dragon Ball Z Airport is not just a regular node service, we have added many special features and optimized experience... | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Many people ask what's different about Dragon Ball Z Proxy Provider Currently sign up for free trial Content: What is different about Dragon Ball Z Proxy Provider? Dragon Ball Z Proxy Provider offers more than just ordinary node services; we've added many unique features and optimized experiences... |
轻微
[流畅性]
"Currently sign up for free trial"
理由: Should be 'Dragon Ball Z proxy provider. Currently giving free trials upon registration.' - punctuation and structure issue | Sentence fragment lacks subject/verb agreement compared to the smoother 'Currently giving free trials upon registration'. | Missing period and slightly different structure from reference 'Currently giving free trials upon registration' | Missing preposition/article; should be ‘Currently offering free trials upon registration’ or similar. |
| Qwen/Qwen2.5-7B-Instruct | What's different about Dragon Ball Z Proxy? Dragon Ball Z Proxy is more than just a regular node service; we have added many unique features and optimized the experience... | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Tanti si chiedono cosa distingue l'aeroporto Dragon Ball Z. Attualmente, registrazione per provare gratuitamente. |
严重
[准确性]
"aeroporto Dragon Ball Z"
理由: Translates “机场” literally as airport instead of proxy/provider service, which is the intended technical meaning. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Literal translation of '机场' (airport) which is slang for 'proxy provider/SSR node service'. Should be translated as 'provider' or 'service'. |
| Qwen/Qwen3.5-4B | Title: Many ask what makes Dragon Ball Z airport different. Currently registering gives an experience trial. Content: What is different about Dragon Ball Z airport? Dragon Ball Z airport is not just ordinary node services; we have added many special features and optimized experiences... | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Many people ask what's different about Dragon Ball Z proxy. Currently registering gives you a free trial. Content: What makes Dragon Ball Z proxy different? Dragon Ball Z proxy is not just an ordinary node service; we have added many unique features and optimized experiences... |
轻微
[术语]
"Dragon Ball Z proxy"
理由: Should be 'Dragon Ball Z proxy provider' | Incomplete terminology. Should be 'Dragon Ball Z proxy provider' for clarity. Reference includes 'provider' | “机场” in this context is a proxy/VPN provider service; omitting “provider/service” is slightly less precise though generally understandable. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Many people ask what's different about Dragon Ball Z Airport. Currently, we offer registration bonuses with experience. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Many people ask what's different about Dragon Ball Z proxy providers. Currently, we offer a trial upon registration. Content: What makes Dragon Ball Z proxy providers different? Dragon Ball Z proxy providers aren't just ordinary node services; we've added many unique features and optimized the user experience... | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "What are the differences between the airport of Dragon Ball Z? The Dragon Ball Z airport is not just a regular node service; we have added many special features and optimization experiences...", "zh-TW": "龙珠Z机场有哪些不同? 龙珠Z机场不仅仅是一个普通的节点服务,我们加入了许多特色功能和优化体验……", "ja": "ドラゴンボールZの空港にはどのような違いがありますか? ドラゴンボールZの空港は、単なる普通のノードサービスではなく、多くの特長と最適化された体験を追加しました……” , "ko": "Dragon Ball Z 공항에는 어떤 차이가 있나요? 드래곤 볼 Z 공항은 단순한 일반 노드 서비스가 아니라 다양한 특장과 최적화된 경험을 추가했습니다…", "de": "Was sind die Unterschiede zwischen dem Flughafen von Dragon Ball Z? Der Dragon Ball Z Flughafen ist nicht nur ein normales Node-Dienst, sondern wir haben viele Spezialfunktionen und Optimierungen hinzugefügt…" , "fr": "Quelles sont les différences entre le terminal de Dragon Ball Z ? Le terminal de Dragon Ball Z n'est pas seulement un service de nœuds ordinaire ; nous avons ajouté de nombreux avantages et expériences d'optimisation..." } | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: Many people ask about the differences between Dragonball Z Airport. Currently, we offer free trial experience upon registration. |
严重
[准确性]
"Many people ask about the differences between Dragonball Z Airport. Currently, we offer free trial experience upon registration."
理由: The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Airport' is a literal translation of '机场' (slang for proxy provider), which is confusing without context. Content is missing. | Incomplete translation - only translates title, missing content. 'Dragonball Z Airport' should be 'Dragon Ball Z proxy provider'. Missing the detailed explanation about special features and optimized experience. | Incorrect: 'Dragonball Z' should be 'Dragon Ball Z' (correct title) |
| google/translategemma-12b-it | Title: Many people ask what is different about Dragon Ball Z proxy providers. Currently, we offer a trial upon registration. Content: What makes Dragon Ball Z proxy providers different? Dragon Ball Z proxy providers are not just ordinary node services; we have added many special features and optimized the user experience... | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Many people ask: What is the difference between Dragonball Z Airport? Currently, we offer free trial upon registration. |
严重
[准确性]
"Dragonball Z Airport"
理由: ‘机场’ in this context means proxy/VPN provider, not literal airport; mistranslation of technical slang. | Misunderstanding - '龙珠Z机场' is a brand name for a proxy service, not 'Dragonball Z Airport'; missing content section with details | The hypothesis translates only the Title (merging it with the first sentence of content) and omits the rest of the Content. It translates '机场' literally as 'Airport' instead of 'proxy provider'.
严重
[术语]
"Dragonball Z Airport"
理由: Should be 'proxy provider' to match the reference. | Should be 'Dragon Ball Z proxy provider' not 'Airport' for consistency with reference terminology. | Incorrect translation - '机场' is slang for 'proxy provider/service', not literal airport; '龙珠Z' should be 'Dragon Ball Z' (proper name) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 很多人问 龙珠Z 机场有什么不同 目前是注册送体验' in en |
致命
[准确性]
"Translation of 'Title: 很多人问 龙珠Z 机场有什么不同 目前是注册送体验' in en"
理由: Meta-text indicating a translation, but no actual translation content is provided. Content/body is entirely missing. | The hypothesis is a meta-prompt string instead of a translation. It fails to translate the title or the content. | Hypothesis is a placeholder, not an actual translation. Content translation is also missing. |
| tencent/HY-MT1.5-7B | Many people ask what makes Dragon Ball Z Airport different. Currently, registration gives you a trial experience. |
严重
[准确性]
"Airport"
理由: Similar to previous items, '机场' here means 'proxy provider', not an actual airport. The literal translation obscures the meaning. | Should be 'Dragon Ball Z proxy provider' not 'Dragon Ball Z Airport'. '机场' is slang for proxy provider. | “机场”在此为代理服务,不是物理机场,应译为“代理机场/节点服务提供商”等。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Classic Car Washing Issues? Qwen3.5 35B Test Successful |
严重
[准确性]
"Classic Car Washing Issues?"
理由: Missing specific term 'Logic' in the title | Misinterprets “洗车问题” which refers to a classic logic puzzle, not literal ‘car washing issues’. | '洗车问题' is a reference to a classic logic puzzle (Chinese '洗车问题' refers to the 'Car Washing Problem' logic puzzle), not literal car washing. Should be 'Classic Car Washing Logic Problem'. | Should be '[Classic Car Washing Logic Problem]' not 'Issues?'; missing brackets and incorrect punctuation | Mistranslation of '洗车问题' (Car Washing Logic Problem, a specific AI benchmark) as generic 'Issues'. Missing content regarding runtime environment. |
| CohereLabs/tiny-aya-water | Title: Classic Car Washing Questions? Qwen3.5 35B Test Successful |
严重
[准确性]
"Classic Car Washing Questions?"
理由: Incomplete translation. Should be '[Classic Car Washing Logic Problem]' to match reference. The question mark changes the meaning. | Mistranslated '洗车问题' (logic puzzle term) as literal car washing questions. | ‘洗车’ here refers to a classic logic puzzle meme, not literal car washing; question mark also changes the title form. | Mistranslation: '经典洗车问题' is a reference to a classic logic puzzle 'car washing problem', not literal 'car washing questions' |
| Qwen/Qwen2.5-14B-Instruct | Title: [Classic Car Wash Issue] Qwen3.5 35B Test Successful Content: Running Environment: llama-server.exe `-m Qwen3.5-35B-A3B-UD-Q3_K_XL.gguf` `-c 8192` `-np 1` `--n-gpu-layers 18' |
严重
[准确性]
"[Classic Car Wash Issue]"
理由: Misinterprets ‘洗车’ which is an idiomatic logic-puzzle reference; reference clarifies as ‘Car Washing Logic Problem’ rather than literal car wash issue. | Should be 'Classic Car Washing Logic Problem' - 'Car Wash' is incomplete, should be 'Car Washing Logic Problem' | Mistranslation of '洗车问题' which is a specific AI logic benchmark ('Car Washing Logic Problem'), not a literal car wash. | Incorrect translation: '洗车问题' refers to a logic puzzle, not literal car washing |
| Qwen/Qwen2.5-7B-Instruct | Title: [Classic Car Wash Issues] Qwen3.5 35B Test Successful Content: Run Environment: llama-server.exe ` -m Qwen3.5-35B-A3B-UD-Q3_K_XL.gguf ` -c 8192 ` -np 1 ` --n-gpu-layers 18 |
严重
[准确性]
"[Classic Car Wash Issues]"
理由: Should be 'Classic Car Washing Logic Problem' not 'Classic Car Wash Issues'. The reference clarifies this is about a logic problem, not general issues. | Source “经典洗车问题” is a known idiom meaning ‘classic trick question/logic puzzle’; literal ‘Car Wash Issues’ distorts the meaning, losing the logic-puzzle nuance. | Mistranslation of '洗车问题'. In AI/LLM context, this refers to the 'Car Washing' logic puzzle/test, not literal car washing issues. 'Logic Problem' or 'Test' is needed. | Incorrect translation of '洗车问题' - should be 'Car Washing Logic Problem' (a classic logic puzzle), not 'Car Wash Issues' (which implies problems/issues). |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Complete failure to translate; output contains only artifacts/tags and misses both Title and Content. | Output is garbage / markup residue unrelated to the source; completely fails to translate title and technical runtime configuration description. | Hypothesis is incomplete/malformed. Should contain proper translation of the source content. |
| Qwen/Qwen3.5-4B | Title: [Classic Car Wash Problem] Qwen3.5 35B Test Successful Content: Runtime Environment: llama-server.exe `-m Qwen3.5-35B-A3B-UD-Q3_K_XL.gguf` `-c 8192` `-np 1` --n-gpu-layers 18 |
轻微
[准确性]
"[Classic Car Wash Problem]"
理由: Literal translation of slang '洗车'. In the context of LLMs, this refers to a specific logic benchmark/test, not washing a vehicle. | Should be 'Car Washing Logic Problem' as per reference. The hypothesis omits 'Logic' which is part of the classic problem name | Slightly less precise than “Classic Car Washing Logic Problem”; omits “logic” aspect but context remains mostly clear. |
| Qwen/Qwen3.5-9B | Title: [Classic Car Wash Issue] Qwen3.5 35B Test Successful Content: Runtime Environment: llama-server.exe `-m Qwen3.5-35B-A3B-UD-Q3_K_XL.gguf` -c 8192 `-np 1` --n-gpu-layers 18 |
严重
[准确性]
"[Classic Car Wash Issue]"
理由: Incorrect translation: 'Car Wash Issue' should be 'Car Washing Logic Problem' - the reference refers to a classic logic puzzle, not a car wash issue | Literal translation of '洗车问题' fails to recognize the specific AI benchmark context ('Car Washing Logic Problem'). | “洗车问题” in this meme context refers to a classic puzzle/logic problem, not literal car washing; mistranslation alters meaning. | Inaccurate translation. '经典洗车问题' refers to a classic logic problem, not a car wash issue. Reference correctly translates as 'Classic Car Washing Logic Problem' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: [Classic Car Wash Issue] Qwen3.5 35B Test Successful |
严重
[准确性]
"[Classic Car Wash Issue]"
理由: Mistranslation: '洗车问题' in this context refers to a 'Car Washing Logic Problem' (a classic logic puzzle), not 'Car Wash Issue'. The reference correctly uses 'Classic Car Washing Logic Problem'. | Literal rendering of “洗车问题”, missing the idiomatic meaning of a classic ‘car washing’ logic puzzle. | Mistranslation of '洗车问题'. In AI/LLM context, this refers to the 'Car Washing' logic puzzle/benchmark, not a literal car wash service issue. |
| google/gemma-3-12b-it | Title: [Classic Car Washing Problem] Qwen3.5 35B Test Successful Content: Running environment: llama-server.exe ` -m Qwen3.5-35B-A3B-UD-Q3_K_XL.gguf ` -c 8192 ` -np 1 ` --n-gpu-layers 18 |
轻微
[准确性]
"[Classic Car Washing Problem]"
理由: Misses “Logic” in “Classic Car Washing Logic Problem”, losing a small nuance about puzzle type. | Literal mistranslation of '洗车问题'. In the AI/LLM community context, this refers to a specific logic benchmark/test case, not literally washing cars. | Reference uses 'Classic Car Washing Logic Problem' - missing 'Logic' |
| google/gemma-3-1b-it | Running environment: llama-server.exe -m Qwen3.5-35B-A3B-UD-Q3_K_XL.gguf -c 8192 -np 1 --n-gpu-layers 18 | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: [Classic Car Washing Issue] Qwen3.5 35B Test Successful |
严重
[准确性]
"[Classic Car Washing Issue]"
理由: "洗车问题" here refers to a classic "car washing" logic puzzle, not an issue with physically washing cars; "logic problem" is missing. | Should be '[Classic Car Washing Logic Problem]' not '[Classic Car Washing Issue]'. The term 'Logic Problem' is more precise in technical context. | Incorrect: 'Issue' should be 'Logic Problem' (洗车问题 is a classic logic puzzle, not an actual issue) | The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Car Washing' is a literal translation of the slang '洗车' (logic puzzle/test), missing the idiomatic meaning. Content is missing. |
| google/translategemma-12b-it | Title: [Classic Car Washing Issue] Qwen3.5 35B Test Successful Content: Running environment: llama-server.exe ` -m Qwen3.5-35B-A3B-UD-Q3_K_XL.gguf ` -c 8192 ` -np 1 ` --n-gpu-layers 18 | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [Classic Car Washing Issues] Qwen3.5 35B Test Successful |
严重
[准确性]
"[Classic Car Washing Issues]"
理由: ‘洗车问题’ in this forum context refers to a classic logic puzzle, not literal car washing; meaning is lost. | Incorrect translation - '洗车问题' is a reference to the classic 'two-sum' or logic problem in programming, not literal car washing; should be 'Classic Car Washing Logic Problem' | The hypothesis translates only the Title and omits the Content. It translates '洗车问题' literally as 'Car Washing Issues' instead of the AI benchmark term 'Car Washing Logic Problem'. | Should be 'Classic Car Washing Logic Problem' to better match the reference and convey the technical nature of the issue. |
| tencent/HY-MT1.5-1.8B | Translation of '经典洗车问题' in en |
致命
[准确性]
"Translation of '经典洗车问题' in en"
理由: The hypothesis is a meta-prompt fragment. It fails to translate the full title and completely omits the content. | Hypothesis is a placeholder, not an actual translation. Content translation is also missing. | Only a meta description, not an actual translation of the title. Entire title and content are effectively untranslated. |
| tencent/HY-MT1.5-7B | "en": "Title: [Classic Car Washing Questions] Qwen3.5 35B Test Successful", "zh-TW": "Title: [經典洗車問題] Qwen3.5 35B測試成功", "ja": "Title: [クラシックな洗車に関する質問] Qwen3.5 35Bテスト成功", "ko": "Title: [고전적인 자동차 세척에 관한 질문들] Qwen3.5 35B 테스트 성공", "de":="Titel: [Klassische Fragen zur Autowäsche] Qwen3.5 35B Test erfolgreich", "fr": "Titre: [Questions classiques sur le lavage de voitures] Qwen3.5 35B test réussi", "it": "Titolo: [Domande classiche sul lavaggio delle auto] Qwen3.5 35B testato con successo"}|im_end|> |
严重
[准确性]
"[Classic Car Washing Questions]"
理由: The term '洗车' (car washing) in this specific AI/LLM context likely refers to a 'logic problem' or benchmark (as per reference 'Logic Problem'), or is a specific meme. Translating it literally as 'Car Washing Questions' misses the technical nuance implied by the reference. | Should be 'Classic Car Washing Logic Problem' not 'Classic Car Washing Questions'. Reference specifies 'Logic Problem'. | 原文为“经典洗车问题”,参考为“Classic Car Washing Logic Problem”,这里遗漏“逻辑”含义,仅译为“问题”弱化了特定梗含义。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Another Mac mini for my hobby |
严重
[准确性]
"Another Mac mini for my hobby"
理由: '养龙虾' is slang for hosting/running OpenClaw, not a hobby. Should be 'Another Mac mini to host OpenClaw'. | Mistranslation of '养龙虾' (host OpenClaw) as generic 'hobby'. Missing content about price increase and intent. | Vague and incomplete; should be 'Got another Mac mini to host OpenClaw' to capture the specific purpose | Omits key information about using the Mac mini to host/play with OpenClaw and the price increase; ‘for my hobby’ is too vague and changes meaning. | Vague translation of '养龙虾' instead of 'host OpenClaw' |
| CohereLabs/tiny-aya-water | Title: Another Mac mini for raising crabs |
严重
[准确性]
"raising crabs"
理由: Mistranslation of '养龙虾'. '龙虾' is slang for OpenClaw, not literal crabs. Should be 'host OpenClaw' | Mistranslated slang '养龙虾' (host OpenClaw) as literal 'raising crabs'. | Mistranslation: '龙虾' is slang for 'OpenClaw' (AI agent), not literal 'crab'; '养' means 'host/run' (software), not 'raise' | Key metaphor mistranslated; refers to running/hosting OpenClaw (‘养龙虾’) not raising crabs. | The translation of '养龙虾' as 'raising crabs' is incorrect. It should be 'hosting OpenClaw' as in the reference. |
| Qwen/Qwen2.5-14B-Instruct | Title: Got Another Mac mini for Lobster Keeping Content: Bought one last year for daily use, didn't expect it to increase in price this year. Planning to keep some lobsters, so got another one. |
严重
[准确性]
"for Lobster Keeping"
理由: Literal mistranslation of '养龙虾' (hosting OpenClaw). | Misinterprets ‘养龙虾’ which in this context means running/hosting OpenClaw, not keeping real lobsters. | Should be 'Got another Mac mini to host OpenClaw' - 'lobster' is slang for OpenClaw | Incorrect: 'Lobster Keeping' should be 'hosting OpenClaw' - '养龙虾' is a metaphor for hosting OpenClaw
严重
[准确性]
"keep some lobsters"
理由: Incorrect: 'keep some lobsters' should be 'play around with OpenClaw' | Same misinterpretation of metaphor; should refer to playing with/hosting OpenClaw. | Literal mistranslation of '玩玩龙虾' (play around with OpenClaw). |
| Qwen/Qwen2.5-7B-Instruct | Title: Got another Mac mini for shrimp farming Content: Bought one last year for daily use, but it's more expensive this year. Planning to play around with shrimp farming, got another one. |
严重
[准确性]
"shrimp farming"
理由: Should be 'to host OpenClaw' not 'for shrimp farming'. The slang '养龙虾' means hosting/running OpenClaw, not farming shrimp. | Mistranslation of '养龙虾' (hosting/running OpenClaw). 'Shrimp farming' implies agriculture. | Incorrect translation of '养龙虾' - should be 'host OpenClaw' (run the software), not 'shrimp farming' (breeding crustaceans). The source uses '龙虾' as slang for OpenClaw. | “养龙虾” is slang for hosting OpenClaw; literal ‘shrimp farming’ misinterprets the technical slang and the specific product OpenClaw. |
| Qwen/Qwen3-14B | Comprato un altro Mac mini per allevare gamberi |
严重
[准确性]
"allevare gamberi"
理由: Mistranslation of slang '养龙虾' (host OpenClaw) as 'raise shrimp'. Omits 'Content' section. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Literal “raise shrimps” instead of using the OpenClaw metaphor for hosting/running software. | Mistranslation: '养龙虾' means 'host/run OpenClaw', not 'allevare gamberi' (raise shrimp). Should be 'Comprato un altro Mac mini per eseguire OpenClaw'. |
| Qwen/Qwen3.5-4B | Title: Bought another Mac mini to raise crabs Content: Bought one last year for daily use; didn't expect prices to rise again this year. Planning to try raising crabs, so bought another one. |
严重
[准确性]
"raise crabs"
理由: “养龙虾” here means using/hosting OpenClaw; also changes animal from lobster to crabs, compounding the error. | 'crabs' is incorrect - '龙虾' in source is slang for OpenClaw (the software), not literal crabs. Reference correctly uses 'host OpenClaw'. | Literal translation of slang '养龙虾' (raising lobster/crab). Context implies hosting/running the 'OpenClaw' software. |
| Qwen/Qwen3.5-9B | Title: Got another Mac mini to raise lobsters Content: Bought one last year for daily use, and surprisingly, prices went up this year. Preparing to play around with lobsters, so got another one. |
严重
[准确性]
"raise lobsters"
理由: Incorrect terminology. '养龙虾' refers to hosting/running OpenClaw, not raising lobsters. Reference correctly translates as 'host OpenClaw' | Literal translation of slang '养龙虾' (host OpenClaw) interpreted as animal husbandry. | Incorrect translation: 'raise lobsters' should be 'host OpenClaw' - 'lobster' is a slang term for OpenClaw, not a literal lobster | “养龙虾” is community slang for running/hosting OpenClaw; literal “raise lobsters” misrepresents the activity. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Another Mac mini for shrimp cultivation |
致命
[准确性]
"shrimp cultivation"
理由: Critical mistranslation: '养龙虾' in this context means 'to host OpenClaw' (run the software), not literal shrimp cultivation. The reference correctly uses 'to host OpenClaw'. This is a terminology error - '养龙虾' is a slang term for hosting OpenClaw in the Chinese tech community. | Misinterprets “养龙虾” which refers to running/hosting OpenClaw, not cultivating shrimp. | Misses context that '龙虾' refers to OpenClaw (AI model), not literal shrimp. Should be 'Got another Mac mini to host OpenClaw' | Mistranslation of slang '养龙虾' (hosting OpenClaw). 'Shrimp cultivation' implies farming seafood, which is completely wrong in this tech context. |
| google/gemma-3-12b-it | Title: I Got Another Mac Mini to Raise Lobsters Content: I bought one for daily use last year, and surprisingly, the price increased again this year. Planning to play around with lobsters, so I got another one. |
严重
[准确性]
"Raise Lobsters"
理由: Literal mistranslation of '养龙虾'. Context implies hosting/running the 'OpenClaw' software. | Mistranslation: 'Raise Lobsters' should be 'host OpenClaw' - the Chinese '养龙虾' is slang for hosting OpenClaw, not literal | Original uses “养龙虾” as metaphor for hosting OpenClaw; translation is literal and loses software/hosting meaning. |
| google/gemma-3-1b-it | Translation of 'Title: 又搞了一台 Mac mini 养龙虾' in en |
致命
[准确性]
"Translation of 'Title: 又搞了一台 Mac mini 养龙虾' in en"
理由: Not a translation - placeholder text instead of actual translation, '养龙虾' (host OpenClaw) not translated | No actual translation is provided, just a meta-statement about translation. | Hypothesis is a meta-description, not an actual translation. Should be 'Got another Mac mini to host OpenClaw' | Model failure. Outputted a meta-commentary instruction instead of performing the translation. | Hypothesis is a translation instruction, not the actual translation. |
| google/gemma-3-4b-it | Got another Mac mini to raise lobsters |
严重
[准确性]
"raise lobsters"
理由: The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Raise lobsters' is a literal mistranslation of '养龙虾' (host OpenClaw). Content is missing. | Incomplete translation - only translates title, missing content. 'raise lobsters' should be 'host OpenClaw'. Missing context about price increase and planning to play around with it. | In this forum context "养龙虾" means to host/run OpenClaw, not literal lobsters. |
| google/translategemma-12b-it | Title: I got another Mac mini to run Lobster. |
严重
[准确性]
"I got another Mac mini to run Lobster."
理由: Incomplete translation. Missing context about buying one last year and price increase. Also 'Lobster' should be 'OpenClaw'. | Mistranslation of '养龙虾' (host/run OpenClaw) as running the animal. | Incorrect: 'Lobster' should be 'OpenClaw' (the software). The source uses 龙虾 (lobster) as slang for OpenClaw. |
| google/translategemma-4b-it | Another Mac mini to keep lobsters |
严重
[准确性]
"Another Mac mini to keep lobsters"
理由: The hypothesis translates only the Title and omits the Content. It translates '养龙虾' literally as 'keep lobsters' instead of 'host OpenClaw'. | Again ‘养龙虾’ is metaphor for hosting OpenClaw; translation is literal and omits OpenClaw and the explanatory content. | Incorrect translation - '养龙虾' is slang for 'hosting OpenClaw', not literal lobster keeping | Should be 'host OpenClaw' to match the reference. | Incomplete translation - missing title prefix and content. '龙虾' refers to 'OpenClaw' not literal lobsters. Should be 'Got another Mac mini to host OpenClaw'. |
| tencent/HY-MT1.5-1.8B | Translation of '又搞了一台 Mac mini 养龙虾' in en |
致命
[准确性]
"Translation of '又搞了一台 Mac mini 养龙虾' in en"
理由: Meta-text only; does not provide an actual translation. Also omits the content sentence about previous purchase and price increase. | The hypothesis is a meta-prompt string. It fails to translate the title or the content. | Hypothesis is a placeholder, not an actual translation. Content translation is also missing. |
| tencent/HY-MT1.5-7B | "en": "Title: Got another Mac mini to raise lobsters", "zh-TW": "Title: Got another Mac mini to raise lobsters", "ja": "Title: 取得另一台Mac mini用来养龙虾", "ko": "Title: 또 다른 Mac mini를 구입하여 랍스터를 키우기 시작", "de":="Titel: Erhalten Sie einen weiteren Mac mini, um Lobster zu züchten", "fr": "Titre: J’ai acheté un autre Mac mini pour élever des lobster“, "it": "Titolo: Ho acquistato un altro Mac mini per allevare i lobster"} |
致命
[准确性]
"raise lobsters"
理由: “养龙虾”是托管 OpenClaw 的隐喻,而非养真实龙虾,应译为“跑 OpenClaw/托管小龙虾”等。 | Should be 'host OpenClaw' not 'raise lobsters'. '养龙虾' is slang for hosting/running OpenClaw. | Fails to translate the slang '养龙虾' (host OpenClaw), resulting in a literal and incorrect meaning about animal farming. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: What is the difference between a tool and a script? |
致命
[准确性]
"What is the difference between a tool and a script?"
理由: Completely incorrect; should be 'Why does my OpenClaw tell me it has no tools to use?' | Entirely incorrect translation of the source content | Complete hallucination; the output does not reflect the source text about 'OpenClaw' (小龙虾) having no tools. | Completely unrelated to the source about OpenClaw reporting no tools available; total mistranslation. | Completely wrong translation - source asks about OpenClaw not having tools, hypothesis asks about tool vs script difference |
| CohereLabs/tiny-aya-water | Content: message - Send messages and channel operations sessions_list - List other sessions (including sub-agents) ... only these can be done | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Invia messaggi e operazioni sul canale |
致命
[准确性]
"Invia messaggi e operazioni sul canale"
理由: Hypothesis is in Italian, not English. Should be in English translation of the source Chinese text | Completely wrong language: hypothesis is in Italian, not English/Chinese translation | Wrong target language (Italian instead of English) and missing content (Title and rest of Content omitted). | Target is in Italian, not the required target language (Chinese), and only translates part of the source; most content is omitted. |
| Qwen/Qwen2.5-7B-Instruct | Title: Why is my LinuxDo shrimp telling me he has no tools to use? Content: message - send messages and channel operations sessions_list - list other sessions (including sub-agents) ... can only do these |
严重
[准确性]
"LinuxDo shrimp"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw', not 'LinuxDo shrimp'. The source uses '小龙虾' as slang for OpenClaw, and 'LinuxDo' is incorrectly added. | “小龙虾” in this context is OpenClaw instance; adding “LinuxDo shrimp” introduces incorrect brand and keeps literal animal metaphor, losing the intended product reference. | Should be 'Why does my OpenClaw tell me it has no tools to use?' not 'LinuxDo shrimp'. Incorrect reference to LinuxDo and wrong pronoun. | Hallucination/Over-translation. Source says '小龙虾' (OpenClaw/Little Lobster). 'LinuxDo' is the name of the forum, not part of the project name here. The reference correctly identifies it as 'OpenClaw'. |
| Qwen/Qwen3-14B | Perché il mio gambero mi ha detto che non aveva strumenti da usare? |
严重
[准确性]
"gambero"
理由: Mistranslation: '小龙虾' refers to OpenClaw (software), not literal 'gambero' (crayfish). Should be 'il mio OpenClaw'. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Mistranslation of slang '小龙虾' (OpenClaw) as 'shrimp/prawn'. Omits 'Content' section. | Literal “shrimp” instead of OpenClaw metaphor; reference clarifies it should be the software name. |
| Qwen/Qwen3.5-4B | Why does my crayfish tell me he has no tools to use? |
严重
[准确性]
"my crayfish"
理由: Metaphorical “小龙虾” refers to OpenClaw instance; literal “crayfish” misrepresents the subject. | Incorrect translation: 'crayfish' should be 'OpenClaw' (the software). The source '小龙虾' is a slang term for OpenClaw, not literal crayfish. Reference correctly uses 'OpenClaw'. | The hypothesis translates the Chinese source text literally ('小龙虾' as 'crayfish'), failing to recognize it as a slang term or nickname for 'OpenClaw' as indicated by the reference. Additionally, the hypothesis omits the translation of the 'Content' section entirely, resulting in significant missing content. |
| Qwen/Qwen3.5-9B | Title: Why does my crayfish tell me he has no tools? Content: message - Send messages and channel operations sessions_list - List other sessions (including sub-agents)... can only do these |
严重
[准确性]
"crayfish"
理由: Incorrect translation: 'crayfish' should be 'OpenClaw' - '小龙虾' is a slang term for OpenClaw, not a literal crayfish | Mistranslation of slang '小龙虾' which refers to 'OpenClaw', not the animal. | “小龙虾” here is the OpenClaw bot, not an actual crayfish; using the animal term misleads about the subject. | Incorrect terminology. '小龙虾' in this context is slang for OpenClaw, not crayfish. Reference correctly translates as 'OpenClaw' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | 為什麼我的小龍蝦告訴我他沒有工具要用? |
致命
[准确性]
"小龍蝦"
理由: Text is in Traditional Chinese but source is Simplified Chinese. Also '小龍蝦' (shrimp) should be 'OpenClaw' based on context | Here refers to “OpenClaw” rather than an actual small lobster; the branded/technical meaning is lost. | Mistranslation of slang '小龙虾' (OpenClaw). Literal translation 'Crayfish/Small Lobster' loses the software reference. | Critical mistranslation: '小龙虾' in this context refers to 'OpenClaw', not literal 'small lobster'. The reference correctly uses 'OpenClaw'. This is a terminology error - '小龙虾' is a slang term for OpenClaw in the Chinese tech community. Also, the translation is in Traditional Chinese (zh-TW) while the target language is zh (Simplified Chinese). |
| google/gemma-3-12b-it | Title: Why is my crawfish telling me he has no tools to use? Content: message - Send messages and channel operations sessions_list - List other sessions (including child agents) ... only these can be done. |
严重
[准确性]
"crawfish"
理由: “小龙虾” here is nickname for OpenClaw; translating as the animal loses the software/product reference. | Mistranslation: 'crawfish' should be 'OpenClaw' - the Chinese '小龙虾' is a nickname for OpenClaw, not literal | Mistranslation of '小龙虾'. While biologically similar, the community slang specifically uses 'Lobster' (or OpenClaw), and 'crawfish' changes the established term. |
| google/gemma-3-1b-it | Why my small shrimp told me he didn't have tools? |
严重
[准确性]
"small shrimp"
理由: Literal translation of '小龙虾' (OpenClaw codename). Loses the specific tech reference. | Incorrect translation - '小龙虾' (OpenClaw) translated as 'small shrimp', grammar issues in English | "小龙虾" refers to OpenClaw, not a literal shrimp. | Should be 'OpenClaw' not 'small shrimp'. '小龙虾' is slang for OpenClaw
严重
[流畅性]
"told me he didn't have tools"
理由: Grammatically incorrect. Should be 'Why does my OpenClaw tell me it has no tools to use?' | Ungrammatical phrasing in English; should be "told me it had no tools" or similar. | Grammatically incorrect ('Why my... told'). Missing translation of the Content list. |
| google/gemma-3-4b-it | Why is my crayfish telling me he doesn’t have any tools to use? |
严重
[准确性]
"crayfish"
理由: 'crayfish' should be 'OpenClaw'. 'telling me he' is anthropomorphizing incorrectly - should be 'tell me it'. Missing the content about message operations and sessions_list. | Misinterprets "小龙虾" which is slang for OpenClaw or a related instance, not literal crayfish. | The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Crayfish' is an incorrect literal translation of '小龙虾' (slang for OpenClaw). Content is missing. |
| google/translategemma-12b-it | Title: Why is my crawfish telling me it doesn't have any tools available? |
严重
[准确性]
"crawfish"
理由: Incorrect: 'crawfish' should be 'OpenClaw' (the software). The source uses 小龙虾 (crawfish/lobster) as slang for OpenClaw. | Should be 'OpenClaw' not 'crawfish'. The source uses '小龙虾' as a codename for OpenClaw, not literal crawfish. | Mistranslation of '小龙虾' (slang for OpenClaw) as the animal. |
| google/translategemma-4b-it | Why does my crayfish tell me he doesn't have any tools? |
严重
[准确性]
"my crayfish"
理由: ‘小龙虾’ here is the OpenClaw instance, not an actual crayfish; should map to OpenClaw. | The hypothesis translates only the Title and omits the Content. It translates '小龙虾' literally as 'crayfish' instead of the slang 'OpenClaw'. | '小龙虾' in this context is slang for 'OpenClaw', not literal crayfish. Should be 'OpenClaw' throughout. | Missing translation - content section with technical details not translated |
| tencent/HY-MT1.5-1.8B | Why does my crayfish tell me he has no tools to use? |
严重
[术语]
"Why does my crayfish tell me he has no tools to use?"
理由: Translated the slang '小龙虾' (OpenClaw) literally as 'crayfish'. Also failed to translate the content section. | '小龙虾' here refers to OpenClaw, not an actual crayfish. Should preserve the product name or correct metaphor. | Incorrect translation of '小龙虾' as 'crayfish' instead of 'OpenClaw'. |
| tencent/HY-MT1.5-7B | "en": "Why does my crayfish tell me it doesn’t have any tools?", "zh-TW": "為什麼我的小龍蝦告訴我它沒有工具使用?", "ja": "なぜ私のザリガニが道具がないと言うのでしょうか?", "ko": "왜 제 작은 소라게가 도구가 없다고 말하는 걸까요?", "de": "Warum sagt mir mein Krebs, dass er keine Werkzeuge hat?」 , "fr": "Pourquoi mon homard me dit-il qu’il n’a pas d’outils ?", "it": "Perché il mio granchio mi dice che non ha strumenti a disposizione?" }|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Experience playing CLaw without restrictions |
严重
[准确性]
"Experience playing CLaw without restrictions"
理由: Incomplete and inaccurate; should be 'Experience from playing around with OpenClaw' and doesn't mention restrictions are self-imposed | Incorrect translation of '瞎玩体会' as 'playing without restrictions' instead of 'playing around with OpenClaw' | Mistranslation of '小龙虾' (OpenClaw) as 'CLaw'. The phrase 'without restrictions' contradicts the source '一直都在限制' (always restricting). Content is missing. | Adds ‘without restrictions’ which contradicts source that mentions imposing restrictions for safety, and misses OpenClaw name nuance. | '小龙虾' is OpenClaw, not 'CLaw'. Should be 'OpenClaw'. |
| CohereLabs/tiny-aya-water | Title: CLaw Experience playing alone |
严重
[准确性]
"CLaw Experience playing alone"
理由: Mistranslation: '小龙虾' is slang for 'OpenClaw', not 'CLaw'; incomplete translation | Mistranslated '小龙虾' (OpenClaw) as 'CLaw'. Missed the nuance of '瞎玩' (playing around/messing around). | Severely incomplete and incorrect. Should be 'Experience from playing around with OpenClaw'. '小龙虾瞎玩体会' means casual experience with OpenClaw, not 'CLaw Experience playing alone' | ‘小龙虾瞎玩体会’ refers to casually playing with OpenClaw; ‘playing alone’ and ‘CLaw’ lose the ‘small shrimp’ / OpenClaw nuance and casual tone. |
| Qwen/Qwen2.5-14B-Instruct | Title: Random Play Experience with Xiaolongxia Content: Played openclaw for a few days, just chatting myself, worried about security issues, always limiting... Thinking about this thing is really just for play, actually all scripts,养成类游戏 translates to养成 type game, slowly playing |
严重
[术语]
"Xiaolongxia"
理由: Leaves ‘小龙虾’ untranslated as pinyin; in this forum it refers to OpenClaw and should be rendered accordingly. | Incorrect: 'Xiaolongxia' should be 'OpenClaw' - it's a nickname, not to be transliterated | Failed to translate the slang '小龙虾' to its intended meaning 'OpenClaw'.
轻微
[流畅性]
"养成类游戏 translates to养成 type game, slowly playing"
理由: Different from reference: 'slowly playing' vs 'take my time playing with it' | Translator artifact included in the output; failed to translate the term naturally. | Should not include meta-commentary about translation in the output; should be 'like a pet-raising game' | Meta-commentary intrudes into the translation; contains untranslated Chinese text and explanation, not a coherent target sentence. |
| Qwen/Qwen2.5-7B-Instruct | Title: Crayfish Random Play Experience Content: Played openclaw for a few days, purely my own chat, afraid of security issues, always restricted... Think about it, this is really just for fun, actually all scripts,养成类游戏, play slowly |
严重
[准确性]
"Crayfish Random Play Experience"
理由: “小龙虾” refers to OpenClaw; literal ‘Crayfish’ changes the topic from software to animal. | Should be 'Experience from playing around with OpenClaw' not 'Crayfish Random Play Experience'. Inaccurate translation of the slang. | Incorrect translation of '小龙虾' - should be 'OpenClaw', not 'Crayfish'. Also '瞎玩体会' means 'experience from playing around', not 'Random Play Experience'. | Mistranslation of '小龙虾' as 'Crayfish' (biological) instead of the project 'OpenClaw'. '瞎玩' is 'playing around' not 'Random Play'.
严重
[准确性]
"养成类游戏"
理由: Untranslated text left in the output. | Chinese text left untranslated in English output. Should be 'like a pet-raising game'. | Keeps 'openclaw' in lowercase instead of 'OpenClaw'. Also leaves '养成类游戏' untranslated when it should be 'pet-raising game'. |
| Qwen/Qwen3-14B | Esperienza di gioco con OpenClaw |
严重
[准确性]
"Esperienza di gioco con OpenClaw"
理由: Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Only provides a brief title-like phrase, omitting detailed content about how the user restricts it, security concerns, and viewing it as a pet-raising scripted game. | Omission of the entire 'Content' section describing the user's experience and security concerns. |
| Qwen/Qwen3.5-4B | Title: My Experience Playing Crayfish Blindly Content: After playing OpenClaw for a few days, I've been chatting purely on my own, fearing security issues and constantly restricting things... Thinking about this thing is really just for fun; actually, it's all scripts, an idle game. Played slowly. |
严重
[准确性]
"Crayfish"
理由: 'Crayfish' should be 'OpenClaw' - the source '小龙虾' is a slang term for OpenClaw, not literal crayfish. | Literal translation of slang '小龙虾'. Refers to the software 'OpenClaw'. | “小龙虾” is a nickname for OpenClaw; literal “crayfish” mistranslates the software reference. |
| Qwen/Qwen3.5-9B | Title: My Experience Playing Xiaolongxia Blindly Content: After playing OpenClaw for several days, I've been chatting purely on my own. Worried about security issues, I've always been restricted... Thinking about this thing really is just for fun; practically, it's all scripts, a cultivation-style game. Played slowly. |
严重
[准确性]
"Xiaolongxia"
理由: Inaccurate translation. '小龙虾瞎玩体会' means 'Experience from playing around with OpenClaw', not 'Playing Xiaolongxia Blindly'. Reference is more accurate | Title “小龙虾瞎玩体会” is a playful name for OpenClaw; leaving “Xiaolongxia” untranslated without indicating it’s OpenClaw may obscure meaning for non-Chinese readers. | Incorrect: 'Xiaolongxia' should be 'OpenClaw' - the slang term should be translated to the actual name; 'Blindly' is an awkward translation of '瞎玩' | Transliteration of slang term instead of translating to the intended referent 'OpenClaw' or adapting the slang appropriately. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | "en": "Translation of '小龙虾瞎玩体会' in en", "zh-TW": "Translation of '小龙虾瞎玩体会' in zh-TW", "ja": "Translation of '小龙虾瞎玩体会' in ja", "ko": "Translation "小龙虾瞎玩体会" in ko", "de": "Translation of '小龙虾瞎玩体会' in de", "fr": "Translation of '小龙虾瞎玩体会' in fr", "it": "Translation of '小龙虾瞎玩体会' in it"} <|im_end|> | 通过 (无共识错误) |
| google/gemma-3-12b-it | Crazy Adventures with Crawfish |
轻微
[准确性]
"Crazy Adventures with Crawfish"
理由: Complete failure to translate the content. Mistranslation of '小龙虾' as 'Crawfish' instead of the software 'OpenClaw'. Omission of the entire body text. | Hypothesis only provides title translation, missing the entire content. Also should be 'Experience from playing around with OpenClaw' not 'Crazy Adventures with Crawfish' | Title does not mention OpenClaw or experience; it invents “Crazy Adventures” and keeps literal crawfish, losing core product meaning and nuance of “impressions from playing” with OpenClaw. | Only Title translated; Content missing entirely |
| google/gemma-3-1b-it | "en": "Translation of '小龙虾瞎玩体会' in en", "zh-TW": "The experience of playing Openclaw for several days, mostly chatting by myself, worried about safety issues, constantly limiting...", Thinking this thing is really just a script, a progression game, slowly played.", "ja": "Openclawを数日プレイした経験、ほとんど会話で。セキュリティの問題に気を付けて、常に制限している...", このものが本当に単なるスクリプトなのか、進歩ゲームなのか、ゆっくりとプレイしていた。", "ko": "오픈클로우를 몇 달 동안 플레이한 경험, 대부분 혼자 이야기하며. 안전 문제에 대한 우려가 있어, 항상 제한하고 있다...", 이 것이 정말 단순한 스크립트인지, 진행 게임인지, 천천히 플레이하는 것일지도 모름.", "de": "Übung mit Openclaw über mehrere Tage, hauptsächlich durch Gespräche. Sorge um Sicherheitsaspekte und beschränkt sich ständig.", "Translation of '小龙虾瞎玩体会' in de", "The experience of playing Openclaw for several days, mostly chatting by myself, worried about security issues, constantly limiting...", Thinking this thing is really just a script, a progression game, slowly played.", "fr": "L'expérience d'une session avec Openclaw pendant plusieurs jours, principalement par des conversations. La préoccupation pour les problèmes de sécurité et la limitation constante...", L'expérience de jouer à Openclaw pendant plusieurs jours, principalement par des conversations.", "it": "L'esperienza di giocare a Openclaw per diversi giorni, principalmente tramite conversazioni. Preoccupazione per i problemi di sicurezza e limitazione costante...", L'esperienza di giocare a Openclaw per diversi giorni, principalmente tramite conversazioni." } |
严重
[准确性]
""en": "Translation of '小龙虾瞎玩体会' in en""
理由: Hypothesis is a translation instruction, not the actual translation. | No real English translation of the title is given, only a placeholder. | Not a translation - placeholder text instead of actual translation, output is in English not Chinese as required |
| google/gemma-3-4b-it | Title: Let's Play Shrimp (LinuxDo) |
轻微
[准确性]
"Let's Play Shrimp (LinuxDo)"
理由: Misreads "小龙虾" which refers to OpenClaw; also omits experiential content about security concerns, scripts, and game-like nature. | The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Shrimp' is an incorrect translation of '小龙虾' (OpenClaw). '(LinuxDo)' is hallucinated/not in source. Content is missing. | Completely incorrect translation: 'Shrimp' should be 'OpenClaw'; 'LinuxDo' is not mentioned in source; title completely wrong | Completely inaccurate translation. Should be 'Experience from playing around with OpenClaw'. 'Let's Play Shrimp (LinuxDo)' bears no relation to the source. Missing entire content about playing with OpenClaw and security concerns. |
| google/translategemma-12b-it | Title: OpenClaw Adventures and Experiences Content: After playing openclaw for a few days, just chatting on my own, I was worried about security issues, so I kept everything restricted... I realized that this is really just for fun, and in reality, it's all scripts, a type of incremental game, to be played slowly. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Let's Play Open Claws |
严重
[准确性]
"Let's Play Open Claws"
理由: Title should reflect ‘experience from playing around with OpenClaw’; current phrasing changes meaning and plurality, and loses ‘experience’ aspect. | Incorrect translation - '小龙虾' is slang for OpenClaw, not 'Open Claws' (plural wrong); '瞎玩体会' means 'experience from playing around', not 'Let's Play' | The hypothesis translates only the Title and omits the Content. The title translation 'Let's Play Open Claws' is a hallucination/misinterpretation of '小龙虾瞎玩体会' (Experience from playing around with OpenClaw). | Incomplete and inaccurate translation. Should be 'Experience from playing around with OpenClaw'. Missing content section entirely. |
| tencent/HY-MT1.5-1.8B | Translation of '小龙虾瞎玩体会' in en |
致命
[准确性]
"Translation of '小龙虾瞎玩体会' in en"
理由: Meta-text only, no actual translation. Entire title and content missing. | The hypothesis is a meta-prompt string. It fails to translate the title or the content. | Hypothesis is a placeholder, not an actual translation. Content translation is also missing. |
| tencent/HY-MT1.5-7B | Title: Experience Playing with小龙虾 | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: What to do when Shrimp does not reply messages? |
轻微
[术语]
"Shrimp"
理由: Incorrect term 'Shrimp' used instead of 'OpenClaw' | Inconsistent translation of the slang '小龙虾' (previously Crab/Lobster, here Shrimp); should be consistent with the community term 'OpenClaw'. Content is missing. | '小龙虾' is slang for OpenClaw, should be 'OpenClaw' not 'Shrimp' | Over-literal rendering of “小龙虾”, losing the established product name ‘OpenClaw’ and making it sound like an actual shrimp. |
| CohereLabs/tiny-aya-water | Title: What to do when Shrimp does not reply messages? |
严重
[准确性]
"Shrimp"
理由: Mistranslation: '小龙虾' is slang for 'OpenClaw', not 'Shrimp' | The translation of '小龙虾' as 'Shrimp' is incorrect. It should be 'OpenClaw' as in the reference. | Should refer to OpenClaw (小龙虾) as a product, not generic ‘Shrimp’. | Mistranslation of '小龙虾'. This is slang for OpenClaw AI model, not literal shrimp. Should be 'OpenClaw' |
| Qwen/Qwen2.5-14B-Instruct | Title: What to do when Xiao Long Hao doesn't reply? Content: As shown in the picture, Xiao Long Hao suddenly stopped replying frequently. After starting to type on the keyboard, there's no response anymore. Checking the status shows everything is normal. |
严重
[准确性]
"Xiao Long Hao"
理由: Incorrect transliteration/translation of '小龙虾'; should be 'OpenClaw'. | Should be 'OpenClaw' - this is slang for OpenClaw and should be translated as such | Misreads ‘小龙虾’; should refer to OpenClaw or ‘little lobster’, not ‘Hao’. |
| Qwen/Qwen2.5-7B-Instruct | What do I do if Xiao Long Xia doesn't reply to messages? |
严重
[准确性]
"Xiao Long Xia"
理由: In this forum slang, “小龙虾” refers to OpenClaw; leaving pinyin and animal metaphor loses the specific software reference and may confuse readers. | Incomplete translation. Only translates title, missing content. Should be 'What should I do if OpenClaw doesn't reply to messages?' with full content translation. | Missing the entire content paragraph from the source. | Missing 'Title:' label. Incorrect translation of '小龙虾' - should be 'OpenClaw', not 'Xiao Long Xia' (literal translation). Also content is missing. |
| Qwen/Qwen3-14B | What to do if Xiao Long Xiang doesn't reply? |
严重
[准确性]
"Xiao Long Xiang"
理由: Hypothesis is in English, but target language is Chinese (zh). Should be in Chinese. Also, '小龙虾' should be translated as OpenClaw, not transliterated as 'Xiao Long Xiang'. | Mistranscribes “小龙虾” as “Xiao Long Xiang”; should map to OpenClaw or a consistent name; also leaves it untranslated in an otherwise English sentence. | Language Error: Output is in English, target is Chinese. Also transliterates the slang term instead of translating the meaning or using the established term. |
| Qwen/Qwen3.5-4B | What to do if crayfish doesn't reply? |
严重
[准确性]
"crayfish"
理由: 'crayfish' should be 'OpenClaw' - the source '小龙虾' is a slang term for OpenClaw, not literal crayfish. Reference correctly uses 'OpenClaw'. | Literal translation of slang '小龙虾'. Refers to the software 'OpenClaw'. | Again, “小龙虾” stands for OpenClaw; literal animal translation is incorrect. |
| Qwen/Qwen3.5-9B | What to do if Little Crayfish doesn't reply? |
严重
[准确性]
"What to do if Little Crayfish doesn't reply?"
理由: Incorrect translation: 'Little Crayfish' should be 'OpenClaw' - '小龙虾' is slang for OpenClaw; also missing Content portion entirely | Literal translation of slang '小龙虾' (OpenClaw) interpreted as the animal. | Missing content section entirely. Hypothesis only provides title without content translation. Reference includes both title and content | “小龙虾” is OpenClaw in this context; literal “Little Crayfish” changes the referent from a bot/tool to an animal and loses the product name. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | What do I do if 'Small Shrimp' isn't responding to messages? |
致命
[准确性]
"'Small Shrimp'"
理由: Should be 'OpenClaw' not 'Small Shrimp' - '小龙虾' is slang for OpenClaw in this context | Mistranslation of slang '小龙虾' (OpenClaw). Treating it as a literal small shrimp loses the reference to the software. | Should refer to OpenClaw; literal ‘small shrimp’ loses the branded meaning of 小龙虾 in this context. | Critical mistranslation: '小龙虾' in this context refers to 'OpenClaw', not literal 'Small Shrimp'. The reference correctly uses 'OpenClaw'. This is a terminology error - '小龙虾' is a slang term for OpenClaw in the Chinese tech community. |
| google/gemma-3-12b-it | Title: What to do when Xiaolongxia doesn't reply? |
严重
[准确性]
"Xiaolongxia"
理由: Hypothesis only provides title translation, missing the entire content. Also should be 'What should I do if OpenClaw doesn't reply to messages?' not 'Xiaolongxia' | This is a product nickname (OpenClaw), but left as pinyin without clarifying; reference resolves it to OpenClaw. Could confuse target readers about intended referent. | Only Title translated; Content missing entirely - reference has full Content | Transliteration of the slang term instead of translating the meaning or using the established English equivalent 'OpenClaw'. |
| google/gemma-3-1b-it | What should I do if the small shrimp doesn't send messages? |
严重
[准确性]
"small shrimp"
理由: Literal translation of '小龙虾' (OpenClaw). Fails to recognize the entity. | "小龙虾" is OpenClaw, not a literal shrimp. | Incorrect translation - '小龙虾' (OpenClaw) translated as 'small shrimp', missing details about 'typing state' and 'status normal' | Should be 'OpenClaw' not 'small shrimp'. '小龙虾' is slang for OpenClaw |
| google/gemma-3-4b-it | What to do if crayfish doesn't respond? |
严重
[准确性]
"crayfish"
理由: "小龙虾" is a nickname for OpenClaw here, not literal crayfish. | The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Crayfish' is an incorrect literal translation. Content is missing. | 'crayfish' should be 'OpenClaw'. Missing content about the typing state issue and status checking. Incomplete translation. |
| google/translategemma-12b-it | Title: What to do when Xiaolongxia (crayfish) doesn't reply to messages? Content: As shown in the picture, Xiaolongxia suddenly often stops replying to messages and gets stuck in the typing state, then nothing happens. The status check shows that everything is normal. |
严重
[准确性]
"Xiaolongxia (crayfish)"
理由: Incorrect: 'Xiaolongxia (crayfish)' should be 'OpenClaw'. The source uses 小龙虾 as slang for OpenClaw, not as a name. | Failure to recognize '小龙虾' as the proper noun/slang for 'OpenClaw', translating it as the animal instead. | Should be 'OpenClaw' not 'Xiaolongxia/crayfish'. The source uses this as a codename for OpenClaw. |
| google/translategemma-4b-it | Title: What to do if crayfish doesn't respond? |
严重
[准确性]
"crayfish"
理由: ‘小龙虾’ here refers to OpenClaw instance; literal ‘crayfish’ is incorrect in this technical/meme context. | The hypothesis translates only the Title and omits the Content. It translates '小龙虾' literally as 'crayfish' instead of 'OpenClaw'. | Missing translation - content section with technical details not translated | '小龙虾' is slang for 'OpenClaw', not literal crayfish. Should be 'OpenClaw' throughout. |
| tencent/HY-MT1.5-1.8B | "en": "What should I do if the crayfish doesn't respond to messages?", "zh-TW": "如果小龙虾不再回复消息该怎么办?", "ja": "小龙虾がメッセージに返信しなくなったらどうすればいい?", "ko": "만약 크릴리가 메시지에 답장하지 않으면 어떻게 해야 할까?"), "de": "Was gibt es zu tun, wenn Krebse keine Antworten mehr auf Nachrichten geben?"), "fr": "Que faire si les crevettes ne répondent plus aux messages?"), "it": "Cosa fare se i gamberi non rispondono più alle notifiche?"} | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | What should I do if the crayfish doesn’t respond to messages? |
严重
[准确性]
"crayfish"
理由: “小龙虾”指 OpenClaw 服务,而非小龙虾动物;字面翻译为“crayfish”会误导。 | Should be 'OpenClaw' not 'crayfish'. '小龙虾' is slang for OpenClaw. | Incorrectly translates the slang '小龙虾' (OpenClaw) as the animal 'crayfish'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Docker installation issues with small scorpion shrimp: Error response from daemon: cannot join network namespace of container |
严重
[准确性]
"small scorpion shrimp"
理由: Nonsensical and incorrect term for “小龙虾”, which refers to OpenClaw; adds ‘scorpion’ and ‘shrimp’ erroneously. | Should be 'OpenClaw' not 'small scorpion shrimp'; '小龙虾' refers to OpenClaw in this context | Mistranslation of '小龙虾' (OpenClaw) as literal 'small scorpion shrimp'. Content is missing. | '小龙虾' is slang for OpenClaw, should be 'OpenClaw' not literal translation 'small scorpion shrimp' |
| CohereLabs/tiny-aya-water | Title: Docker installation issues with small shrimp: Error response from daemon: cannot join network namespace of container |
严重
[准确性]
"small shrimp"
理由: Mistranslation: '小龙虾' is slang for 'OpenClaw', not 'small shrimp' | Mistranslation of '小龙虾'. This is slang for OpenClaw, not literal shrimp. Should be 'OpenClaw' | Loses reference to OpenClaw (小龙虾) as a specific project; reads as literal crustacean. | Mistranslated '小龙虾' (OpenClaw) as literal 'small shrimp'. |
| Qwen/Qwen2.5-14B-Instruct | Title: encountered an issue installing 小龙虾 with Docker: Error response from daemon: cannot join network namespace of container Content: Recently, I wanted to set up a 小龙虾 on my MacBook Pro M1 Max 64 at home and tried using Docker for installation. Following the documentation, here's what I did: |
严重
[准确性]
"小龙虾"
理由: Failed to translate the slang term '小龙虾' to 'OpenClaw'. | Mixing Chinese and English: 'set up a 小龙虾' should be 'set up an OpenClaw instance' | Should be translated as 'OpenClaw' not left in Chinese - this is slang terminology that should be translated |
| Qwen/Qwen2.5-7B-Instruct | Title: docker installation issue with LinuxDo community: Error response from daemon: cannot join network namespace of container Content: Recently, I tried to keep a small shrimp on my MacBook Pro M1 Max 64 and wanted to test docker installation. According to the documentation, I performed the following steps: |
严重
[准确性]
"LinuxDo community"
理由: “小龙虾” refers to OpenClaw instance; replacing it with ‘LinuxDo community’ is incorrect and changes the subject. | Hallucination. Source says 'docker 安装小龙虾' (installing OpenClaw via docker). Hypothesis inserts 'LinuxDo community' which is not in the source text. | Should be 'Encountered an issue installing OpenClaw with Docker' not 'LinuxDo community'. Incorrect reference. | Incorrect translation of '小龙虾' - should be 'OpenClaw', not 'LinuxDo community'. The source uses '小龙虾' as slang for OpenClaw.
严重
[准确性]
"small shrimp"
理由: Incorrect translation of '小龙虾' - should be 'host an OpenClaw instance', not 'keep a small shrimp'. Also 'I performed the following steps' vs 'I did the following' - minor style. | Should be 'host an OpenClaw instance' not 'keep a small shrimp'. | Mistranslation of '养一只小龙虾' (host an OpenClaw instance). Literal translation makes no sense in context. | Again literalizes slang; should be OpenClaw instance or similar, not an animal. |
| Qwen/Qwen3-14B | ...} |
致命
[准确性]
"...}"
理由: Hypothesis is truncated and does not translate either the title or the content about Docker installation error and hosting OpenClaw on MacBook. | Hypothesis is incomplete/malformed. Should contain proper translation of the source content. | Incomplete output; fails to provide any meaningful translation of Title or Content. |
| Qwen/Qwen3.5-4B | Title: Encountered an issue installing Docker Crayfish: Error response from daemon: cannot join network namespace of container Content: Recently tried to raise a crayfish on my home MacBook Pro M1 Max 64 and wanted to test Docker installation. According to the documentation, I performed the following steps: |
严重
[准确性]
"Docker Crayfish"
理由: 'Crayfish' should be 'OpenClaw' - the source '小龙虾' is a slang term for OpenClaw, not literal crayfish. Reference correctly uses 'OpenClaw'. | Mistranslation of 'docker 安装小龙虾'. It means installing the software 'OpenClaw' via Docker, not a product named 'Docker Crayfish'. | “小龙虾” refers to OpenClaw; term “Crayfish” misidentifies the software. |
| Qwen/Qwen3.5-9B | Title: Encountered issues installing Xiaolongxia via Docker: Error response from daemon: cannot join network namespace of container Content: Recently, I wanted to raise a Xiaolongxia on my MacBook Pro M1 Max 64 at home and tried setting up Docker. Following the documentation, I performed the following steps: |
严重
[准确性]
"raise a Xiaolongxia"
理由: Incorrect: 'raise a Xiaolongxia' should be 'host an OpenClaw instance'; 'setting up Docker' should be 'the Docker installation' | Incorrect terminology. '小龙虾' in this context is slang for OpenClaw, not 'Xiaolongxia'. Reference correctly translates as 'OpenClaw' | Transliteration of slang term instead of using the referent 'OpenClaw'. | Here “养一只小龙虾” means to run/host an OpenClaw instance; literal “raise a Xiaolongxia” (an animal) misrepresents the technical deployment activity. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Docker Installation Issues with Small Shrimp on MacBook Pro M1 Max 64 |
致命
[准确性]
"Small Shrimp"
理由: Misinterprets 小龙虾 which denotes OpenClaw instance, not literal shrimp. | Critical mistranslation: '小龙虾' in this context refers to 'OpenClaw', not literal 'Small Shrimp'. The reference correctly uses 'OpenClaw'. This is a terminology error - '小龙虾' is a slang term for OpenClaw in the Chinese tech community. | Should be 'OpenClaw' not 'Small Shrimp' - '小龙虾' refers to OpenClaw in this technical context | Mistranslation of slang '小龙虾' (OpenClaw). Literal translation is incorrect in this context. |
| google/gemma-3-12b-it | Title: Problem installing Xiaolongxia with Docker: Error response from daemon: cannot join network namespace of container Content: Recently, I wanted to raise a Xiaolongxia on my Macbook Pro M1 Max 64 at home and try installing it with Docker. According to the documentation, I performed the following operations: |
严重
[准确性]
"Xiaolongxia"
理由: Mistranslation: 'Xiaolongxia' should be 'OpenClaw' - the Chinese '小龙虾' is a nickname for OpenClaw, not transliteration | Transliteration of the slang term '小龙虾' instead of using the established software name 'OpenClaw'. | Nickname for OpenClaw; using pinyin without clarifying the software loses explicit product reference compared to reference. |
| google/gemma-3-1b-it | Docker installation problems with small shrimp: Error response from daemon: cannot join network namespace of container |
严重
[准确性]
"small shrimp"
理由: Incorrect translation - '小龙虾' (OpenClaw) translated as 'small shrimp', rest is technical content which is fine | Misrenders "小龙虾" which refers to OpenClaw. | Literal translation of '小龙虾' (OpenClaw). | Should be 'OpenClaw' not 'small shrimp'. '小龙虾' is slang for OpenClaw |
| google/gemma-3-4b-it | Title: Docker installation of crayfish encountered problems: Error response from daemon: cannot join network namespace of container |
严重
[准确性]
"crayfish"
理由: The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Crayfish' is an incorrect literal translation. Content is missing. | Again, "小龙虾" refers to OpenClaw, not a literal crayfish. | 'crayfish' should be 'OpenClaw'. Missing content about MacBook Pro M1 Max and the Docker installation steps. Incomplete translation. |
| google/translategemma-12b-it | Title: Issues encountered during Xiaolongxia installation via Docker: Error response from daemon: cannot join network namespace of container Content: Recently, I wanted to run Xiaolongxia on my Macbook Pro M1 Max 64 at home and decided to try installing it using Docker. According to the documentation, I performed the following steps: |
严重
[准确性]
"Xiaolongxia"
理由: Transliterated the slang term '小龙虾' instead of using the established English equivalent 'OpenClaw' used in the community/context. | Should be 'OpenClaw' not 'Xiaolongxia'. The source uses this as a codename for OpenClaw. | Incorrect: 'Xiaolongxia' should be 'OpenClaw'. The source uses 小龙虾 as slang for OpenClaw. |
| google/translategemma-4b-it | Title: docker installation of crayfish encountered problems: Error response from daemon: cannot join network namespace of container |
严重
[准确性]
"crayfish"
理由: Again, metaphorical ‘小龙虾’ for OpenClaw; literal ‘crayfish’ is wrong in context. | The hypothesis translates only the Title and omits the Content. It translates '小龙虾' literally as 'crayfish' instead of 'OpenClaw'. | Missing translation - content section with technical details not translated | '小龙虾' is slang for 'OpenClaw', not literal crayfish. Should be 'OpenClaw' throughout. |
| tencent/HY-MT1.5-1.8B | Translation of 'Docker Installation Troubleshooting for Catfish': Error Response from Daemon: Unable to Join Network Namespace of Container |
严重
[术语]
"Docker Installation Troubleshooting for Catfish"
理由: Translated the slang '小龙虾' (OpenClaw) incorrectly as 'Catfish'. The format is also a meta-prompt string. | Mistranslates 小龙虾 as catfish; in this forum context it refers to OpenClaw instance. | Incorrect translation of '小龙虾' as 'Catfish' instead of 'OpenClaw'.
严重
[准确性]
"Catfish"
理由: Failed to translate the content section. The output is a malformed title translation prompt. | Source uses '小龙虾' which is slang for 'OpenClaw', not 'Catfish'. The hypothesis uses the wrong animal term. | Mistranslation: '小龙虾' (crayfish/lobster) should be 'OpenClaw', not 'Catfish'. The title is incorrectly translated. |
| tencent/HY-MT1.5-7B | "en": "Title: Issues encountered while installing crayfish using Docker: Error message from the daemon: 'Cannot join network namespace of container'", "zh-TW": "Title: Issues encountered while installing crayfish using Docker: Error message from the daemon: 'Cannot join network namespace of container'", "ja": "Title: 使用Docker安装小龙虾时遇到的问题:守护进程的错误消息:‘无法加入容器的网络命名空间’", "ko": "Title: 사용Docker 설치 크레이피시 시 발생한 문제들: 데몬의 오류 메시지: ‘컨테이너의 네트워크 이름 공간에 가입할 수 없습니다’", "de":="Titel: Probleme bei der Installation von Krabben mit Docker: Fehlermeldung des Daemons: ‘Kann nicht dem Netzwerknamensraum des Containers beitreten’", "fr": "Titre : Problèmes rencontrés lors de l’installation de crevettes avec Docker : Message d’erreur du daemon : ‘Ne peut pas rejoindre l’espace de nommage réseau du conteneur’", "it": "Titolo: Problemi riscontrati durante l’installazione delle gamberette con Docker: Messaggio di errore dal demone: ‘Non è possibile unirsi allo spazio di nome della rete del contenitore’"} |
严重
[准确性]
"crayfish"
理由: “小龙虾”指 OpenClaw,直译为“crayfish”错误。 | Should be 'OpenClaw' not 'crayfish'. '小龙虾' is slang for OpenClaw. | Incorrectly translates the slang '小龙虾' (OpenClaw) as the animal 'crayfish'. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Have you encountered CC SWITCH crash issues? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Is there any specific reason why this issue occurs? Any workarounds to avoid this problem? |
致命
[准确性]
"Is there any specific reason why this issue occurs? Any workarounds to avoid this problem?"
理由: Hypothesis is completely missing the title and provides only partial content. Should include title 'Has anyone encountered the CC-Switch crash issue?' and full content translation | Complete hallucination. The hypothesis does not translate the source text at all, but generates generic questions. | Completely different content from source; the original asks if others have seen CC-Switch crashing on Windows with WSL+Codex, while hypothesis is generic questions without environment or app details. | Completely different translation: Title and Content are both mistranslated; should be about 'CC SWITCH crash issue' not generic question |
| Qwen/Qwen2.5-14B-Instruct | Title: Has anyone encountered an issue where CC SWITCH crashes suddenly? Content: Environment: Windows system Recently using cc switch + WSL + Codex, sometimes during conversation I find that the connection is interrupted and see that cc switch has crashed. Has anyone experienced similar issues? What might be the cause? |
轻微
[风格]
"Has anyone encountered an issue where CC SWITCH crashes suddenly?"
理由: Different from reference: 'crashes suddenly' vs 'crash issue' | Should be 'Has anyone encountered the CC-Switch crash issue?' - more concise and natural | Slightly verbose compared to the more natural 'Has anyone encountered the CC-Switch crash issue?'. |
| Qwen/Qwen2.5-7B-Instruct | Has anyone encountered issues with CC SWITCH crashing? |
严重
[准确性]
"Has anyone encountered issues with CC SWITCH crashing?"
理由: Missing 'Title:' label. Also 'CC SWITCH' should be 'CC-Switch' with hyphen to match reference. | Missing the entire content paragraph describing the environment and specific issue details. | Incomplete translation. Only translates title, missing content. Reference provides full translation including environment details and context. |
| Qwen/Qwen3-14B | Qualcuno ha mai avuto problemi di crash con CC SWITCH? |
严重
[准确性]
"Qualcuno ha mai avuto problemi di crash con CC SWITCH?"
理由: Omission of the 'Content' section detailing the environment and specific symptoms. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Title is fine but omits the hyphen styling “CC-Switch”; still understandable. |
| Qwen/Qwen3.5-4B | Title: Has anyone encountered CC SWITCH crashing issues? Content: Environment: Windows system Recently used cc switch + wsl + codex, sometimes noticed connection interrupted halfway through conversation, checked and found cc switch crashed. Have any experienced similar problems, what might be the cause? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Has anyone encountered CC SWITCH crashing issues? Content: Environment: Windows system. Recently used cc switch + WSL + Codex, sometimes found that the connection was interrupted halfway through a conversation, and upon checking, CC Switch had crashed. Have any experienced users encountered similar problems, and what might be the cause? |
轻微
[风格]
"Have any experienced users"
理由: Adds “experienced users” which is not in the source; slight register change though intent is similar. | Less natural phrasing. Source uses '有佬' which is internet slang. Reference translates as 'Has any bro encountered' which is more idiomatic | Reference uses 'Has any bro encountered similar issue? What might be the reason?' - singular form and 'bro' instead of 'experienced users' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Have you encountered CC SWITCH crash issues on Windows systems when using cc switch + wsl + codex? Sometimes the connection drops mid-conversation and CC SWITCH crashes. Has anyone else experienced similar problems? What might be the cause? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Has anyone encountered CC SWITCH crashing? |
严重
[准确性]
"Has anyone encountered CC SWITCH crashing?"
理由: Only Title translated; Content missing entirely - reference has full Content with environment details | Omission of the entire 'Content' section of the source text. | Hypothesis only provides title translation, missing the entire content |
| google/gemma-3-1b-it | Does anyone experience issues with CC Switch flashing? |
严重
[准确性]
"issues with CC Switch flashing"
理由: Mistranslation of '闪退' (crash/close unexpectedly) as 'flashing' (visual light). Completely changes the technical issue. | Should be 'crash' not 'flashing'. '闪退' means crash/sudden exit, not flashing | "闪退" means app crashes/exits unexpectedly, not flashing; mistranslation of the key issue. | Incorrect translation - '闪退' (crash) translated as 'flashing', should be 'crash' |
| google/gemma-3-4b-it | Has anyone encountered CC SWITCH crashing issues? | 通过 (无共识错误) |
| google/translategemma-12b-it | Has anyone encountered CC SWITCH crashing issues? | 通过 (无共识错误) |
| google/translategemma-4b-it | Has anyone encountered CC SWITCH crashing issues? |
轻微
[术语]
"CC SWITCH"
理由: The term should be hyphenated as 'CC-Switch' and properly cased. | Inconsistent formatting - reference uses 'CC-Switch' with hyphen | Reference uses ‘CC-Switch’; missing hyphen is a minor terminology inconsistency. |
| tencent/HY-MT1.5-1.8B | Translation of '有人遇到过CC SWITCH闪退问题吗?' in en |
致命
[准确性]
"Translation of '有人遇到过CC SWITCH闪退问题吗?' in en"
理由: Meta description only, with no actual translation. Content sentence is also completely omitted. | Hypothesis is a placeholder, not an actual translation. Content translation is also missing. | The hypothesis is a meta-prompt string. It fails to translate the title or the content. |
| tencent/HY-MT1.5-7B | Title: Has anyone encountered problems with CC Switch crashing? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title:公益cc seems to be quite unstable and uses up resources quickly, how does codex 5.3 compare? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title:公益cc seems to be quite unstable and uses up a lot of resources, how does codex 5.3 compare? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: It seems that public cc is quite unstable and consumes resources quickly. How does codex 5.3 compare to cc? Content: The current site is too fast and not sufficient. There are also occasional restrictions. The public codex site appears to be much more stable. I wonder how they compare. What do you think? |
严重
[术语]
"public cc"
理由: Inconsistent: 'public cc' should be 'Free Claude Code APIs' to match reference | Ambiguous; 'Free Claude Code APIs' clarifies the slang '公益cc'. | Ambiguous; source implies ‘公益cc’ as free Claude Code APIs, whereas ‘public cc’ is unclear.
严重
[准确性]
"The current site is too fast and not sufficient."
理由: Should be 'The Any site consumes quota too fast, it's not enough' - incorrect interpretation | Misinterprets ‘用的还是太快了,不够用’ which refers to quota being used up quickly, not the site being ‘too fast’ in speed. | Misinterpretation of 'any 站用的还是太快了'; refers to quota consumption speed, not the site's performance speed. | Incorrect: 'too fast' should be 'consumes quota too fast' - context missing |
| Qwen/Qwen2.5-7B-Instruct | Title: Charity cc seems to be quite unstable and used up quickly too, how about codex 5.3 compared to cc? Content: Any station is still too fast, not enough to use. And there seem to be occasional restrictions. The codex charity station seems to have more and is much more stable. I just wonder how they compare. What do you think? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | I CC pubblici sembrano molto instabili e consumano risorse velocemente. Come si confronta Codex 5.3 con il CC? |
严重
[准确性]
"i CC pubblici"
理由: Slightly vague compared to “Claude Code 免费站 / free CC proxy” but generally understandable as public CC services. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Omission of the 'Content' section comparing specific sites and stability. |
| Qwen/Qwen3.5-4B | Title: Public welfare cc seems basically very unstable and gets used up quickly anyway. How does codex 5.3 compare to cc? Content: Any site I use is too fast and not enough. Also sometimes restricted. The public welfare station for codex seems to have more and be much more stable. Just don't know how they compare. What do you think? |
轻微
[准确性]
"any site"
理由: Unclear reference. Should clarify what 'Any site' refers to in context | Misinterpretation. 'any 站' likely refers to a specific site named 'Any' or 'any.io', or is a typo for a specific provider, but 'Any site I use' changes the meaning slightly. | 'too fast' mistranslates '用的太快' which means 'consumes quota too fast'. 'public welfare station' is awkward; reference uses 'free API proxy site'. Also 'codex' should be 'Codex'. | “any站” is a specific site name (Any); hypothesis reads as generic “any site”, changing the meaning. |
| Qwen/Qwen3.5-9B | Title: Public service cc seems mostly unstable and runs out very quickly. How does codex 5.3 compare to cc? Content: The usage on any site is still too fast, not enough. And there are occasional restrictions. There seem to be more public service stations for codex, which also appear more stable. I just don't know how they compare. What do you all think? |
严重
[准确性]
"Public service cc"
理由: Incorrect: 'Public service cc' should be 'Free Claude Code APIs'; 'cc' should be 'Claude Code' - these are abbreviations for specific services | Incomplete terminology. Should clarify this refers to 'Free Claude Code APIs'. Reference provides clearer context | “公益cc” refers specifically to free Claude Code APIs, not a generic “public service cc”; this loses the specific product meaning. | Failed to expand slang 'cc' to 'Claude Code' and '公益' to 'Free API/proxy', leading to confusion.
轻微
[准确性]
"any site"
理由: Failed to recognize 'any 站' as a specific proper noun (AnyAPI site) or context-specific reference. | “any站” is a particular site/service name; translating as generic “any site” removes that reference. | Multiple errors: 'any site' should be 'Any site' (proper name); 'public service stations' should be 'free API proxy sites'; 'codex' should be 'Codex' (proper name); 'consume quota' vs 'runs out' - different meaning |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The public welfare 'cc' seems quite unstable and is used very quickly. How does codex 5.3 compare to 'cc'? |
严重
[准确性]
"The public welfare 'cc' seems quite unstable and is used very quickly. How does codex 5.3 compare to 'cc'?"
理由: Hypothesis is content not title. Title should be translated. Also 'cc' should be 'Claude Code' for clarity | Mistranslation of '公益'. In this context, it means 'free' or 'community-provided', not 'public welfare' (charity). | Mistranslation: '公益cc' refers to 'free Claude Code APIs' (free proxy services for Claude Code), not 'public welfare cc'. The reference correctly uses 'Free Claude Code APIs'. |
| google/gemma-3-12b-it | Title: Public cc seems to be very unstable and uses resources quickly. How does Codex 5.3 compare to cc? Content: The sites still use too much resources, not enough. And there are occasional restrictions. There seem to be many more public codex sites, and they are also much more stable. I just don’t know how they compare. What do you all think? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Any station uses are too fast, not enough. |
严重
[准确性]
"Any station uses are too fast, not enough."
理由: Incomplete translation - missing context about 'free Claude Code APIs', 'Codex', 'restricted', 'stable', and comparison question | Incomplete and unclear translation. Should include full context about quota consumption and stability comparison | Partial and inaccurate translation of the source content. | Does not fully convey that quota on the Any site is consumed too fast and is insufficient; also omits the instability and comparison with Codex sites from context. | Severe truncation. Only translated a fragment of the Content. Missed the Title entirely. Missed the comparison with Codex and the question to the community. |
| google/gemma-3-4b-it | Title: Public cc seems quite unstable and used up quickly, how does it compare to Codex 5.3? |
严重
[准确性]
"Public cc"
理由: Source refers to free Claude Code (cc) proxy APIs; "Public cc" is vague and loses the "Claude Code" meaning. | The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). Content is missing. | Inaccurate terminology - 'Public cc' should be 'Free Claude Code APIs'. 'used up quickly' should be 'consume quota very fast'. Missing content about API proxy sites and stability comparison. |
| google/translategemma-12b-it | Title: Public cc seems to be very unstable and uses resources quickly. How does Codex 5.3 compare to cc? |
严重
[准确性]
"uses resources quickly"
理由: Incomplete translation. Missing content about quota consumption and stability comparisons. Reference clarifies 'Free Claude Code APIs' and 'Claude Code' terminology. | Omission of the 'Content' section discussing specific site performance and restrictions. | Slightly vague compared to "consume quota very fast"; still understandable but less precise. | 'cc' should be 'Claude Code' (the full name). The source uses 'cc' as abbreviation for Claude Code. |
| google/translategemma-4b-it | Title: Public cc seems to be quite unstable and used quickly, how does codex 5.3 compare to cc? |
严重
[准确性]
"Public cc"
理由: The hypothesis translates only the Title and omits the Content. It fails to expand 'cc' to 'Claude Code' or 'free API' as done in the reference, reducing clarity. | Inaccurate terminology. 'cc' should be 'Claude Code' or 'free Claude Code APIs'. Missing content section. 'codex 5.3' context unclear without full content. | ‘公益cc’ refers to free Claude Code APIs; ‘Public cc’ is vague and misses ‘Claude Code’. | Missing translation - content section with detailed comparison not translated |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '公益cc好像基本都很不稳定而且用的很快啊,codex 5.3跟cc比起来怎么样呢?' into English", "zh-TW": "翻译为‘公益cc好像基本都很不稳定而且用的很快啊,codex 5.3跟cc比起来怎么样呢?’成中文”, "ja": "翻译为‘公益cc好像基本都很不稳定而且用的很快啊,codex 5.3跟cc比起来怎么样呢?’成日语”, "ko": "翻译为‘公益cc好像基本都很不稳定而且用的很快啊,codex 5.3跟cc比起来怎么样呢?’成韩语”, "de": "翻译为‘公益cc好像基本都很不稳定而且用的很快啊,codex 5.3跟cc比起来怎么样呢?’成德语”, "fr": "翻译为‘公益cc好像基本都很不稳定而且用的很快啊,codex 5.3跟cc比起来怎么样呢?’成法语”, "it": "翻译为‘公益cc好像基本都很不稳定而且用的很快啊,codex 5.3跟cc比起来怎么样呢?’成意大利语" }, "error": "无法找到匹配的JSON对象。"}\n | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: Public welfare cc seems to be quite unstable and used up quickly. How does Codex 5.3 compare?", "zh-TW": "标题:公共福利cc似乎相当不稳定,使用速度也很快。Codex 5.3相比如何呢?", "ja": "タイトル:公共福祉ccはかなり不安定で、使い果たされるのが速いようです。Codex 5.3と比べるとどうでしょうか?", "ko": "제목: 공공복지 cc는 상당히 불안정하며 사용 속도도 매우 빠른 것 같습니다. Codex 5.3과 비교해서는 어떨까요?", "de":="Titel: Die öffentliche Wohlfahrts-CC scheint ziemlich instabil zu sein und wird schnell aufgebraucht. Wie vergleicht sich Codex 5.3 damit?", "fr":="Titre : La CC pour le bien public semble assez instable et s’épuise rapidement. Comment se compare-t-elle avec Codex 5.3 ?", "it":="Titolo: La CC per scopi di pubblica beneficenza sembra essere piuttosto instabile e viene utilizzata con rapidità. Come si compara con Codex 5.3?"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: A small incident on the car Content: The story goes like this, I drove home from school and picked up my good friend along the way. On the way, he suddenly asked me if I had ever been in love. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: A small incident on the car Content: The story goes like this, I drove home from school and picked up my good friend along the way. On the way, he suddenly asked me if I had ever dated. | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: A Little Incident on the Car Ride Content: It happened that I was driving home from school and picked up my good friend along the way. On the ride, he suddenly asked me if I had ever dated. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: A Little Interlude on the Car Ride Content: It was like this, I was driving back home from school and picked up my good friend along the way. On the road, he suddenly asked me if I had ever dated anyone. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Un piccolo episodio in auto |
严重
[准确性]
"Un piccolo episodio in auto"
理由: Only translates the title; omits full narrative about driving home from school, giving a classmate a ride, and being asked about romantic relationships. | Omission of the 'Content' section telling the story about the classmate. | Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. |
| Qwen/Qwen3.5-4B | Title: A Small Incident on the Car Content: Here's what happened. I was driving home from school and picked up my good friend along the way. Suddenly, he asked me if I had ever been in a relationship. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: A little incident on the car | Content: Here's what happened. I was driving from school to home and picked up my good friend along the way. Suddenly he asked me if I had ever been in a relationship. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | A small incident while driving back home from school. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: A Little Incident on the Car Content: Here's what happened, I was driving home from school and gave my good friend a ride along the way. On the road, he suddenly asked me if I had ever been in love. | 通过 (无共识错误) |
| google/gemma-3-1b-it | A small song on the road. |
严重
[准确性]
"A small song on the road."
理由: Complete mistranslation - '小插曲' (little incident/episode) translated as 'small song', missing entire content about driving and relationship question | Incorrect translation. '小插曲' means 'interlude' or 'incident', not 'song'. Should be 'A little interlude in the car' | Hallucination/Mistranslation. '小插曲' means 'interlude' or 'incident', not 'song'. Complete omission of the Content story. | "小插曲" means a small episode/interlude, not a song; mistranslation of idiom. |
| google/gemma-3-4b-it | A Little Incident on the Car | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: A little anecdote about a car ride Content: Here's what happened: I was driving home from school and gave my good friend a ride. On the way, he suddenly asked me if I had ever been in love. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: A Small Incident While Driving | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '一个在车上的小插曲' in en |
致命
[准确性]
"Translation of '一个在车上的小插曲' in en"
理由: Hypothesis is a placeholder, not an actual translation. Content translation is also missing. | The hypothesis is a meta-prompt string. It fails to translate the title or the content. | Meta-text only, no actual translation. The whole narrative content is omitted. |
| tencent/HY-MT1.5-7B | A Small Incident on the Car | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Tips for quickly pasting Windows images when using WSL2 in CC and Opencode | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Tips for quickly pasting Windows images when using WSL2 in CC and Opencode |
轻微
[术语]
"Tips for quickly pasting Windows images when using WSL2 in CC and Opencode"
理由: Abbreviations left expanded inconsistently or without context compared to reference (Claude Code/OpenCode), though acceptable if standard. | Incomplete: 'CC' should be 'Claude Code' and 'Opencode' should be 'OpenCode' for consistency with reference | Loses explicit naming ‘Claude Code’ and ‘OpenCode’; otherwise meaning is preserved. |
| Qwen/Qwen2.5-14B-Instruct | Title: Tips for Quickly Pasting Windows Images in CC and Opencode When Using WSL2 Content: Tips for Quickly Pasting Windows Images in CC and Opencode Method 1: Basic Usage Select an image in Windows, then press Ctrl+C to copy In the WSL2 terminal, execute wslpath |
轻微
[术语]
"CC and Opencode"
理由: Incorrect: 'CC' should be 'Claude Code', 'Opencode' should be 'OpenCode' - proper names | Reference uses ‘Claude Code and OpenCode’; abbreviation may be unclear without expansion but still mostly understandable. | Should be 'Claude Code and OpenCode' - CC is abbreviation that should be expanded | Failure to expand slang acronyms 'CC' (Claude Code) and 'Opencode' (OpenCode) for clarity, as done in the reference. |
| Qwen/Qwen2.5-7B-Instruct | Tips for Quickly Pasting Windows Images in CC and Opencode When Using WSL2 Method 1: Basic Usage Select an image in Windows, press Ctrl+C to copy Execute wslpath in the WSL2 terminal |
轻微
[术语]
"CC and Opencode"
理由: Should be 'Claude Code and OpenCode' for more precise terminology. | Acronyms 'CC' and 'Opencode' are kept. Reference expands them to 'Claude Code' and 'OpenCode' for clarity, which is preferred if the full names are known in context. | Reference uses full names ‘Claude Code and OpenCode’; abbreviation is acceptable but slightly less clear for some readers. | Incorrect translation of 'CC' and 'Opencode' - should be 'Claude Code' and 'OpenCode' (proper names). Also missing 'Title:' label. |
| Qwen/Qwen3-14B | Consigli per incollare rapidamente immagini Windows in CC e Opencode quando si utilizza WSL2 | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Tips for Quickly Pasting Windows Images in CC and Opencode When Using WSL2 Content: Tips for Quickly Pasting Windows Images in CC and Opencode Method 1: Basic Usage Select an image in Windows, Ctrl+C to copy Execute wslpath in the WSL2 terminal |
轻微
[术语]
"CC and Opencode"
理由: 'Opencode' should be 'OpenCode' to match proper naming. Also 'CC' should be 'Claude Code' for clarity. | Reference uses “Claude Code and OpenCode”; shortened “CC” is understandable but less explicit. | Failed to expand abbreviations 'CC' (Claude Code) and 'Opencode' (OpenCode) for clarity, though acceptable if context is known. | Should be 'Claude Code and OpenCode' not 'CC and Opencode'. Reference uses full proper names |
| Qwen/Qwen3.5-9B | Title: Tips for quickly pasting Windows images in CC and OpenCode when using WSL2 Content: Tips for quickly pasting Windows images in CC and OpenCode Method 1: Basic Usage Select an image in Windows, Ctrl+C to copy Execute wslpath in the WSL2 terminal | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Using WSL2, you can quickly paste Windows images in CC and Opencode | 通过 (无共识错误) |
| google/gemma-3-12b-it | Tips for quickly pasting Windows images in CC and OpenCode when using WSL2 |
严重
[准确性]
"Tips for quickly pasting Windows images in CC and OpenCode when using WSL2"
理由: Omission of the entire 'Content' section. Also, 'CC' and 'OpenCode' should likely be expanded to 'Claude Code' and 'OpenCode' for clarity as per reference. | Only Title translated; Content missing entirely - reference has full Content with method details | Hypothesis only provides title translation, missing the entire content. Also should be 'Claude Code' not 'CC' |
| google/gemma-3-1b-it | Tips for quickly pasting Windows images in CC and Opencode when using WSL2 |
轻微
[术语]
"CC and Opencode"
理由: Should be 'Claude Code and OpenCode' for clarity and consistency with reference | Minor terminology issue - 'CC' should be 'Claude Code', 'Opencode' should be 'OpenCode' | Acronyms used without expansion, whereas reference expands them for clarity (Claude Code, OpenCode). Acceptable but less precise. | Does not expand product names (Claude Code and OpenCode) but still understandable; minor terminology inconsistency. |
| google/gemma-3-4b-it | Tips for quickly pasting Windows images in CC and Opencode when using WSL2 | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Tips for quickly pasting Windows images in CC and Opencode when using WSL2 Content: Tips for quickly pasting Windows images in CC and Opencode Method 1: Basic usage Select an image in Windows and copy it (Ctrl+C). In the WSL2 terminal, execute `wslpath`. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Tips for quickly pasting Windows images in CC and Opencode when using WSL2 |
轻微
[术语]
"CC and Opencode"
理由: Should be 'Claude Code and OpenCode' for clarity and proper terminology. | Reference spells ‘Claude Code’ and ‘OpenCode’; abbreviation ‘CC’ and case variation may reduce clarity slightly. | Incorrect - 'CC' refers to Claude Code, 'Opencode' should be 'OpenCode'; proper names should be preserved |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: Hypothesis is '登录' (Login in Chinese) which has no relation to the source content about pasting images in WSL2. This is completely incorrect. | Hypothesis is completely incorrect - '登录' (login) has no relation to the source text about pasting Windows images. This appears to be a wrong input. | Translates an unrelated word '登录' (login) instead of the given title. Entire actual source (tips about pasting images into CC and OpenCode) is not translated. | The hypothesis is completely unrelated to the source text (translating 'Login' instead of the provided text about WSL2 and image pasting). |
| tencent/HY-MT1.5-7B | "en": "Title: Tips for Quickly Pasting Windows Pictures in CC and Opencode When Using WSL2", "zh-TW": "标题:使用 WSL2 时在 CC 和 Opencode 中快速粘贴 Windows 图片的技巧", "ja": "标题:使用 WSL2 时在 CC 和 Opencode 中快速粘贴 Windows 图片的技巧", "ko": "标题:使用 WSL2 时在 CC 和 Opencode 中快速粘贴 Windows 图片的技巧", "de":="Titel: Tipps für die schnelle Anbringung von Windows-Bildern in CC und Opencode beim Einsatz von WSL2", "fr": "Titre : Conseils pour coller rapidement des images Windows dans CC et Opencode lors de l'utilisation de WSL2", "it": "Titolo: Consigli per incollare rapidamente immagini Windows in CC e Opencode quando si utilizza WSL2"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Self-built New API site, which reverse proxy service should I choose? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Self-built New API site, which reverse proxy service to choose? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Self-built New API Site, How Should Reverse Proxy Service Be Chosen? Content: I have always used NAS + CF Tunnel configuration, but recently found that long conversations seem to be getting truncated. How do you guys solve the penetration issue? |
轻微
[术语]
"penetration issue"
理由: Different from reference: 'penetration issue' vs 'NAT penetration issue' | Missing technical specificity; 'NAT penetration issue' is the standard term for '穿透问题' in this context. | Lacks specificity of ‘NAT penetration issue’; could be misread, though context suggests networking. | Should be 'NAT penetration issue' - more precise technical terminology |
| Qwen/Qwen2.5-7B-Instruct | Title: Self-built New API Site, How to Choose Reverse Proxy Service? Content: I have always used nas + cf tunnel configuration, but recently found that long conversations seem to be truncated. How do the experts solve the穿透 problem? |
严重
[准确性]
"穿透"
理由: Leaves '穿透' in Chinese when it should be translated as 'NAT penetration issue'. Also 'nas' should be 'NAS' (capitalized). | Untranslated Chinese character '穿透' (penetration/NAT penetration) left in the English text. | Chinese text left untranslated in English output. Should be 'NAT penetration' based on context. |
| Qwen/Qwen3-14B | Titolo: Creare il proprio sito New API, come scegliere il servizio di reverse proxy? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Setting up your own New API site, how to choose reverse proxy services? Content: I have always used NAS + CF Tunnel configuration, but recently discovered that long conversations seem to be truncated. How do the experts solve the penetration issue? |
轻微
[术语]
"penetration issue"
理由: Literal translation of '穿透'. In networking context, 'NAT penetration' or 'tunneling issue' is more precise. | Should be 'NAT penetration issue' not just 'penetration issue'. Reference provides more specific terminology | 'penetration issue' should be 'NAT penetration issue' as in reference for clarity. |
| Qwen/Qwen3.5-9B | Title: How to choose a self-hosted New API site and reverse proxy service? Content: I've always used nas + cf tunnel configuration, but recently noticed that long conversations seem to get truncated. How do you all solve the penetration issue? |
轻微
[术语]
"penetration issue"
理由: Literal translation of '穿透'; 'NAT penetration' is the more precise technical term in this networking context. | Inaccurate terminology. Should be 'NAT penetration issue' or 'network penetration issue'. Reference correctly specifies 'NAT penetration issue' | Source implies NAT traversal; “penetration issue” without “NAT” is vague and slightly awkward. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Building Your Own New API Site, How to Choose Reverse Proxy Services? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Building a New API Site, How to Choose a Reverse Proxy Service? |
严重
[准确性]
"Title: Building a New API Site, How to Choose a Reverse Proxy Service?"
理由: Omission of the entire 'Content' section. | Only Title translated; Content missing entirely - reference has full Content about NAS + CF Tunnel setup | Hypothesis only provides title translation, missing the entire content |
| google/gemma-3-1b-it | What should be done to select a reverse proxy service for a self-built New API site? | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: Setting up Your Own New API Site, How to Choose a Reverse Proxy Service? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Building a New API Site from Scratch, How Should I Choose a Reverse Proxy Service? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Choosing a reverse proxy service when setting up your own New API site. |
严重
[准确性]
"Title: Choosing a reverse proxy service when setting up your own New API site."
理由: Missing translation - content section with technical details about NAS/CF Tunnel and truncation issue not translated | Should be 'self-hosting' | The hypothesis translates only the Title and omits the Content. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 自建New API站点,反代服务应该怎么选?' in en |
致命
[准确性]
"Translation of 'Title: 自建New API站点,反代服务应该怎么选?' in en"
理由: The hypothesis is a meta-prompt string. It fails to translate the title properly or the content. | Hypothesis is a placeholder, not an actual translation. Content translation is also missing. | Meta description only; does not provide an actual translation of the title or the content explanation about NAS + CF Tunnel and truncation. |
| tencent/HY-MT1.5-7B | "en": "Title: How to Choose a Reverse Proxy Service for Self-Built New API Sites?", "zh-TW": "Title: 如何选择自建新API站点的反向代理服务?", "ja": "Title: 自建新APIサイトのためのリバースプロキシサービスの選択方法は?", "ko": "Title: 자체 구축한 새로운 API 사이트를 위한 리버스 프록시 서비스 선택 방법은?」 , "de": "Title: Wie wählt man eine Reverse-Proxy-Service für selbst gebaute neue API-Sites aus?", "fr": "Titre: Comment choisir un service de reverse proxy pour des nouveaux sites API auto-construits ?", "it": "Titolo: Come scegliere un servizio di reverse proxy per nuovi siti API costruiti in casa?"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | If you find English reading tiring, you can use the 'Immersive Translation' extension on Chrome to search for 'Immersive Translation'. It uses free GLM translation which is much better than Microsoft and Google Translate. | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | If you find English reading tiring, you can use the 'Immersive Translation' extension on Chrome to search for 'Immersive Translation'. It uses free GLM translation inside, which is much better than Microsoft and Google Translate. |
轻微
[准确性]
"search for 'Immersive Translation'"
理由: Minor error: 'search for' should be 'search in' or 'enable'; redundant 'Immersive Translation' mentioned twice | Process reversed: you search the extension name, not inside the extension; minor nuance loss. | Missing title translation. The hypothesis only provides content, not the title '[Data Series 2] Complete Collection of all AI subreddits on Reddit [Hand-collected, long-term updates]' | Slight deviation in instruction flow compared to source, but acceptable. |
| Qwen/Qwen2.5-14B-Instruct | Title: :fire: [Data Series 2] Comprehensive List of All AI Subreddits on Reddit [Manually Collected, Regularly Updated] Content: If you find reading English tiring, you can use the Chrome extension to search for 'Immersive Translation', and use the free GLM translation inside, which is much better than Microsoft or Google Translate. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: :fire:【资料系列2】Reddit all AI subreddits collection【Hand-collected, long-term updates】 Content: If you find English tiring to read, you can use the "沉浸式翻译" extension in Chrome and try the free GLM translation inside, which is much better than Microsoft or Google translations | 通过 (无共识错误) |
| Qwen/Qwen3-14B | :fire: [Serie di risorse 2] Elenco completo di tutti i subreddit AI su Reddit [raccolti manualmente, aggiornamenti a lungo termine] |
严重
[准确性]
":fire: [Serie di risorse 2] Elenco completo di tutti i subreddit AI su Reddit [raccolti manualmente, aggiornamenti a lungo termine]"
理由: Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Title translation is accurate, but the fire emoji is rendered as text rather than the original symbol; minor stylistic discrepancy. | Omission of the 'Content' section recommending the Chrome extension. |
| Qwen/Qwen3.5-4B | Title: 🔥 [Data Series 2] Complete Guide to All AI Sections on Reddit (Hand-collected, Long-term updates) Content: If you find English tiring to read, use "Immersive Translation" from Chrome extensions and utilize the free GLM translation within it. It is far superior to Microsoft or Google Translate. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | "en": "Title: :fire:[Complete Guide to All Reddit AI Subreddits][Hand-collected, Long-term Updates] Content: If you find English tiring, use 'Immersive Translation' from Chrome extensions for free GLM translation, which is much better than Microsoft or Google translations.", "zh-TW": "標題::fire:【資料系列2】Reddit 所有 AI 板塊大全【手工收集,長期更新】內容:覺得英文看很累,可以在 Chrome 擴充功能中搜尋「沉浸式翻譯」,使用內建免費的 GLM 翻譯,效果遠勝於微軟或谷歌翻譯。", "ja": "タイトル::fire:【資料シリーズ2】Reddit のすべての AI サブreddit 一覧【手作業で収集、長期的に更新予定】コンテンツ:英語を読むのが疲れる場合は、Chrome 拡張機能で「Immersive Translation」を検索し、無料で提供されている GLM 翻訳を使用してください。Microsoft や Google の翻訳よりもはるかに優れています。", "ko": "제목: :fire: [Reddit 모든 AI 커뮤니티 가이드][수동 수집, 장기 업데이트] 내용: 영어를 읽는 것이 지치신다면 크롬 확장 프로그램에서 'Immersive Translation'을 검색하여 무료로 제공되는 GLM 번역을 사용하세요. 마이크로소프트나 구글 번역보다 훨씬 더 좋습니다.", "de": "Titel: :fire:[Alle Reddit AI-Subreddits im Überblick][Manuell gesammelt, langfristige Aktualisierungen] Inhalt: Wenn Ihnen das Lesen von Englisch zu anstrengend ist, suchen Sie nach „Immersive Translation" in den Chrome-Erweiterungen und verwenden Sie die kostenlose GLM-Übersetzung, die deutlich besser ist als die von Microsoft oder Google.", "fr": "Titre : :fire:[Guide complet de tous les sous-forums AI de Reddit][Collecté manuellement, mises à jour régulières] Contenu : Si vous trouvez l'anglais fatiguant, utilisez « Immersive Translation » dans les extensions Chrome pour la traduction gratuite GLM, qui est bien meilleure que celle de Microsoft ou Google.", "it": "Titolo: :fire:[Guida completa a tutti i subreddit AI di Reddit][Raccolto manualmente, aggiornamenti a lungo termine] Contenuto: Se trovi l'inglese stancante, usa "Immersive Translation" dalle estensioni di Chrome per la traduzione gratuita GLM, che è molto migliore rispetto a quella di Microsoft o Google." } | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: :fire:【Series 2】All Reddit AI Subreddits Handbook【Hand-collected and regularly updated | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: 🔥[Data Series 2] Complete List of All AI Sections on Reddit [Manually Collected, Continuously Updated] | 通过 (无共识错误) |
| google/gemma-3-1b-it | Translation of 'Title: :fire:【资料系列2】Reddit所有AI板块大全【手工收集,长期更新】' in en |
致命
[准确性]
"Translation of 'Title: :fire:【资料系列2】Reddit所有AI板块大全【手工收集,长期更新】' in en"
理由: Hypothesis is a meta-description, not an actual translation. Should be '🔥 [Data Series 2] Complete Collection of all AI subreddits on Reddit [Hand-collected, long-term updates]' | Model failure. Outputted a meta-commentary instruction instead of performing the translation. Omitted Content entirely. | Not a translation - placeholder text instead of actual translation, output is in English not Chinese as required | Hypothesis is a translation instruction, not the actual translation. | No translation provided; only a meta-description of the translation task. |
| google/gemma-3-4b-it | Title: 🔥【Series 2 - Resources】Comprehensive Collection of All AI Subreddits【Manually Collected, Continuously Updated】 Content: Tired of reading English? Use "Immersive Translation" in Chrome extensions to use the free GLM translator, which is much better than Microsoft and Google translation. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: 🔥[Resource Series 2] Complete List of All AI Subreddits on Reddit [Manually Collected, Continuously Updated] Content: If you find reading English tiring, you can use the "Immersive Translation" Chrome extension and use the free GLM translation within it. It is much better than Microsoft or Google Translate. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: 🔥【Series 2: AI Resources】Comprehensive Guide to All Reddit AI Subreddits [Manually Collected, Continuously Updated] |
轻微
[准确性]
"【Series 2: AI Resources】"
理由: Source ‘【资料系列2】’ is closer to ‘Data Series 2’; ‘AI Resources’ adds extra specification not in source. | Should be 'Data Series 2' | The hypothesis translates only the Title and omits the Content. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '資料系列2】Reddit所有AI板块大全【手工收集,长期更新'"}, "zh-TW": "Translation of '資料系列2】Reddit所有AI板块大全【手工收集,長期更新'"}, "ja": "Translation of '資料系列2】Reddit所有AI板块大全【手工收集,長期更新'"}, "ko": "Translation of '資料系列2】Reddit所有AI板块大全【手工收集,長期更新'"}, "de": "Translation of '資料系列2】Reddit所有AI板块大全【手工收集,長期更新'"}, "fr": "Translation of '資料系列2】Reddit所有AI板块大全【手工收集,長期更新'"}, "it": "Translation of '資料系列2】Reddit所有AI板块大全【手工收集,長期更新'" } | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: :fire:[Data Series 2] Comprehensive List of All AI Sections on Reddit [Manually Collected, Regularly Updated]", "zh-TW": "Title: :fire:[資料系列2] Reddit上所有AI版塊的完整列表[手工收集,定期更新]", "ja": "Title: :fire:[資料シリーズ2] RedditにあるすべてのAIセクションの完全なリスト[手動で収集され、定期的に更新されます]", "ko": "Title: :fire:[자료 시리즈2] Reddit의 모든 AI 섹션 목록[수동으로 수집되고 정기적으로 업데이트됩니다]", "de":="Titel: :fire:[Datenserie 2] Vollständige Liste aller AI-Sectionen auf Reddit [Manuell gesammelt, regelmäßig aktualisiert]", "fr":="Titre : :fire:[Série de données 2] Liste complète de toutes les sections AI sur Reddit [Collectée manuellement, mises à jour régulièrement]", "it":="Titolo : :fire:[Serie di dati 2] Elenco completo di tutte le sezioni AI su Reddit [Raccolto manualmente, aggiornato regolarmente]"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: (Question) Some Issues with Deploying Small Crabs |
严重
[准确性]
"Small Crabs"
理由: Mistranslation of '小龙虾' (OpenClaw) as literal 'Small Crabs'. Missing content about the lightweight version and panel features. | Should be 'OpenClaw' not 'Small Crabs'; '小龙虾' refers to OpenClaw in this context | '小龙虾' is slang for OpenClaw, should be 'OpenClaw' not literal 'Small Crabs' |
| CohereLabs/tiny-aya-water | Title: (Question) Some Issues with Deploying Small Crabs |
严重
[准确性]
"Small Crabs"
理由: Mistranslation: '小龙虾' is slang for 'OpenClaw', not 'Small Crabs' | Mistranslation of '小龙虾'. This is slang for OpenClaw, not literal crabs. Should be '(Question) Some issues with deploying OpenClaw'. Also missing content translation. | The translation of '小龙虾' as 'Small Crabs' is incorrect. It should be 'OpenClaw' as in the reference. | Mistranslated '小龙虾' (OpenClaw) as literal 'Small Crabs'. | Mistranslation of ‘小龙虾’ which refers to a lightweight OpenClaw instance, not literal small crabs. |
| Qwen/Qwen2.5-14B-Instruct | Title: (Issue) Some issues with deploying small lobsters Content: I remember there was a small lobster feature that works similarly to the big lobster (OpenClaw), but with lower occupancy. Is this project available in Chinese? And does it have a panel like the big lobster? |
严重
[准确性]
"small lobsters"
理由: Literal rendering of ‘小龙虾’; in this technical context it refers to a lightweight OpenClaw variant, not actual lobsters. | Literal mistranslation of '小龙虾' referring to a lightweight version of OpenClaw. | Should be 'lightweight OpenClaw' - 'small lobster' is slang that should be properly translated | Incorrect: 'small lobsters' should be 'OpenClaw' - '小龙虾' is a nickname for OpenClaw
轻微
[术语]
"occupancy"
理由: Different from reference: 'lower occupancy' vs 'lower resource usage' | Non-idiomatic for computing; ‘resource usage’ or ‘resource consumption’ would match source meaning better. | Incorrect term for resource usage; 'resource usage' is standard. |
| Qwen/Qwen2.5-7B-Instruct | Title: (Issue) Some issues about small shrimp deployment Content: I remember that there was a Small Shrimp and OpenClaw feature that were the same but had lower occupancy. Is there a Chinese version for this project? And similarly, it has a panel like the big shrimp. |
严重
[准确性]
"small shrimp"
理由: Should be 'deploying OpenClaw' not 'small shrimp deployment'. The slang should be translated to the actual product name. | Mistranslation of '小龙虾' (OpenClaw) as 'small shrimp'. The source contrasts '小龙虾' (lightweight version) and '大龙虾' (full version), both referring to OpenClaw variants, not actual shrimp. | “小龙虾” refers to lightweight OpenClaw; literal ‘small shrimp’ loses product identity. | Incorrect translation of '小龙虾' - should be 'OpenClaw', not 'small shrimp'. Also 'Issue' should be 'Question' to match reference. |
| Qwen/Qwen3-14B | (Domanda) Alcuni problemi con il deploy del gambero |
严重
[准确性]
"(Domanda) Alcuni problemi con il deploy del gambero"
理由: Hypothesis is in Italian, but target language is Chinese (zh). Should be in Chinese. | Only gives a broad title; omits details about lightweight OpenClaw with same functionality but lower resource usage, question on Chinese version, and having a panel like full OpenClaw. Also uses literal “gambero” instead of OpenClaw term. | Mistranslation of '小龙虾' as 'gambero' (shrimp) instead of the tool name. Omission of 'Content' section. | Mistranslation: '小龙虾' refers to OpenClaw (software), not 'gambero' (crayfish). Should be 'problemi con il deploy di OpenClaw'. |
| Qwen/Qwen3.5-4B | Title: (Issue) Some issues regarding the deployment of Little Crawfish Content: I recall there was a Little Crawfish and Big Crawfish (OpenClaw) that have the same functionality but consume less resources. Does this project support Chinese? Also, does it have a dashboard similar to OpenClaw? |
严重
[准确性]
"Little Crawfish and Big Crawfish (OpenClaw)"
理由: Literal translation of '小龙虾'. Refers to a lightweight version of the software 'OpenClaw', not a crustacean. | Source contrasts lightweight OpenClaw vs full OpenClaw, not two separate crawfish products; translation introduces extra entities. | Incorrect interpretation: source mentions '小龙虾和大龙虾(OpenClaw)' - '大龙虾' means 'big/OpenClaw', not 'Big Crawfish'. The source is asking about a lightweight version of OpenClaw, not a separate 'Little Crawfish' product. |
| Qwen/Qwen3.5-9B | Title: (Issue) Some issues regarding the deployment of Xiaolongxia Content: I remember there was a feature called Xiaolongxia that functions similarly to OpenClaw but has lower resource usage. Does this project have Chinese support? Also, does it come with a control panel like OpenClaw? |
严重
[准确性]
"there was a feature called Xiaolongxia"
理由: Source says “有一个小龙虾和大龙虾(OpenClaw)功能是一样的”, i.e., a lightweight OpenClaw implementation; treating Xiaolongxia as a ‘feature’ mischaracterizes it as a function instead of an alternative project/instance. | Incorrect: 'Xiaolongxia' should be 'lightweight OpenClaw'; 'feature' should be 'a lightweight version' | Transliteration of slang term '小龙虾' instead of identifying it as a variant of 'OpenClaw'. | Incorrect terminology. '小龙虾' in this context refers to OpenClaw or a lightweight version of it, not 'Xiaolongxia'. Reference correctly translates as 'OpenClaw' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: (Issue) Some Issues with Small Shrimp Crabs Deployment |
致命
[准确性]
"Small Shrimp Crabs"
理由: Mistranslation of slang '小龙虾' (OpenClaw). 'Small Shrimp Crabs' is a literal and confusing translation that loses the software reference. | Adds ‘Crabs’ which is not in the source; 小龙虾 refers to a lightweight OpenClaw variant, not ‘shrimp crabs’. | Should be 'OpenClaw' not 'Small Shrimp Crabs' - '小龙虾' is slang for OpenClaw. Also 'Crabs' is incorrect | Critical mistranslation: '小龙虾' in this context refers to 'OpenClaw', not literal 'Small Shrimp Crabs'. The reference correctly uses 'OpenClaw'. This is a terminology error - '小龙虾' is a slang term for OpenClaw in the Chinese tech community. |
| google/gemma-3-12b-it | Title: (Issue) Some issues deploying Xiaolongxia Content: I remember that Xiaolongxia and OpenClaw had similar functionality but with lower resource usage. Does this project have Chinese documentation? Also, does it have a dashboard like OpenClaw? |
严重
[准确性]
"Xiaolongxia and OpenClaw had similar functionality"
理由: Transliteration of the slang term '小龙虾' instead of using the established software name 'OpenClaw' or 'lightweight OpenClaw'. | Mistranslation: 'Xiaolongxia' should be 'OpenClaw' - the Chinese '小龙虾' is a nickname for OpenClaw, not transliteration | Source distinguishes “小龙虾”和“大龙虾(OpenClaw)” as light vs full OpenClaw; translation treats Xiaolongxia as separate from OpenClaw rather than a lightweight variant. |
| google/gemma-3-1b-it | Translation of 'Title: (问题)' small shrimp deployment issues |
致命
[准确性]
"Translation of 'Title: (问题)' small shrimp deployment issues"
理由: Model failure. Mixed meta-commentary with partial translation. 'small shrimp' is a literal mistranslation of '小龙虾' (OpenClaw). Complete omission of the Content text. | Hypothesis is malformed meta-description. Should be '(Question) Some issues with deploying OpenClaw' | Meta-statement instead of an actual translation; also omits OpenClaw reference and misuses "small shrimp" for OpenClaw. | Not a proper translation - placeholder text mixed with incorrect translation, '小龙虾' translated as 'small shrimp' |
| google/gemma-3-4b-it | Title: (Issue) Some issues with small lobster deployment |
严重
[准确性]
"small lobster"
理由: The source text is in Chinese, but the hypothesis is in English. This is a complete failure to translate into the target language (zh). 'Small lobster' is a literal mistranslation of the slang '小龙虾' (OpenClaw). Content is missing. | 'small lobster' should be 'OpenClaw'. Should be '(Question)' not '(Issue)'. Missing content about lightweight version, resource usage, Chinese version availability, and panel features. | "小龙虾" is a lighter OpenClaw-like project, not a literal small lobster. |
| google/translategemma-12b-it | Title: (Issue) Some issues with Xiaolongxia deployment Content: I remember that there was a feature called Xiaolongxia that was similar to OpenClaw, but had lower resource usage. Does this project have Chinese documentation? Also, does it have a panel like OpenClaw? |
严重
[准确性]
"feature called Xiaolongxia that was similar to OpenClaw"
理由: Should be 'OpenClaw' not 'Xiaolongxia'. The source uses this as a codename for OpenClaw. | Source says "a lightweight OpenClaw" with same features; hypothesis reinterprets it as a feature named Xiaolongxia, altering the product relationship. | Failed to translate the slang '小龙虾' to its intended referent 'OpenClaw' (or a lightweight version thereof), treating it as a separate entity name or animal. |
| google/translategemma-4b-it | Title: (Issue) Some issues with deploying small lobsters |
严重
[准确性]
"small lobsters"
理由: ‘小龙虾’ is the lightweight OpenClaw variant; literal ‘small lobsters’ obscures the technical meaning and relation to OpenClaw. | Missing translation - content section with technical details about resource usage and panel not translated | '小龙虾' is slang for 'OpenClaw', not literal lobsters. Should be 'OpenClaw' throughout. Also missing content section. | The hypothesis translates only the Title and omits the Content. It translates '小龙虾' literally as 'small lobsters' instead of 'OpenClaw'. | Should be 'OpenClaw' |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: Hypothesis is completely incorrect - '登录' (login) has no relation to the source text about OpenClaw deployment. This appears to be a wrong input. | Hypothesis is '登录' (Login in Chinese) which has no relation to the source content about deploying OpenClaw. This is completely incorrect. | Translates the wrong word ('登录') and ignores the given title. The actual question about deploying OpenClaw and a lightweight version is entirely omitted. | The hypothesis is completely unrelated to the source text (translating 'Login' instead of the provided text about OpenClaw deployment). |
| tencent/HY-MT1.5-7B | "en": "Title: Some Issues with the Deployment of Lobster Crabs", "zh-TW": "标题:小龙虾部署中遇到的一些问题", "ja": "标题:小龙虾部署中存在的一些问题", "ko": "제목: 소형 랍스터 크랩 배포와 관련된 몇 가지 문제", "de":="Titel: Einige Probleme bei der Bereitstellung von Kleineren Hummerkrabben", "fr": "Titre: Quelques problèmes concernant la mise en place des petits homards crabe", "it": "Titolo: Alcuni problemi relativi alla distribuzione dei piccoli granchi di aragosta"} |
致命
[准确性]
"Lobster Crabs / 小龙虾部署的一些问题"
理由: Should be 'OpenClaw' not 'Lobster Crabs'. '小龙虾' is slang for OpenClaw. | Completely fails to recognize '小龙虾' and '大龙虾' as slang for 'OpenClaw' (lightweight and full versions). Translates them as actual animals ('Lobster Crabs', 'Smaller Lobster Crabs'), making the text nonsensical. | “小龙虾”是 OpenClaw 的别称,应直接指 OpenClaw 部署问题,而不是“lobster crabs”等字面含义。 |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Is there any need to do Mini on Windows? Can it be done? |
严重
[准确性]
"Is there any need to do Mini on Windows? Can it be done?"
理由: Original asks about 'Mini' (Mac mini) and Windows compatibility for OpenClaw, but hypothesis only mentions Windows | Mistranslates key terms: "Mini" = Mac mini, "小龙虾" = OpenClaw; drops mention of OpenClaw and question about what everyone is using. | Failed to translate the slang 'Mini' (referring to Mac mini) and the entity '小龙虾' (OpenClaw), resulting in a nonsensical sentence. | Hypothesis fails to identify that '小龙虾' is a slang term for 'OpenClaw' (a software tool), not literally about shrimp or Mac mini. The translation is completely inaccurate. |
| CohereLabs/tiny-aya-water | Title: Is there any need to do Mini on Little Crab? Can windows do that? |
致命
[准确性]
"Mini"
理由: Failed to translate slang 'Mini' (Mac mini) and '小龙虾' (OpenClaw), resulting in nonsensical literal translation. | Misinterprets implied product "Mac mini" as generic "Mini" losing specific meaning. | Completely mistranslates '小龙虾' (OpenClaw - a software tool) as 'Little Crab'. Also fails to recognize 'Mini' as Mac mini and misses the context about OpenClaw compatibility with Windows. |
| Qwen/Qwen2.5-14B-Instruct | Title: Is there a need to get Mini small crayfish? Can Windows do this? Content: What are you all using? |
轻微
[准确性]
"Can Windows do this?"
理由: Missing 'it' reference - should be 'Can Windows do it?' to match reference | Less precise than "Can Windows do it?" but meaning largely preserved. | The source 'windows 可以吗' implies capability ('Can Windows handle/run it?'), while 'do this' is slightly vague compared to the reference context. |
| Qwen/Qwen2.5-7B-Instruct | Is it necessary to have Mini Shrimp? Can Windows do it? |
严重
[术语]
"Mini Shrimp"
理由: Incorrect translation of 'Mini' (Mac mini) and '小龙虾' (OpenClaw). Should be 'Mac mini' and 'OpenClaw' respectively. | Mistranslates the product/name “小龙虾”, which in-context refers to OpenClaw/LinuxDo, not literal shrimp. | Incorrect translation of '龙虾' as 'Mini Shrimp' instead of 'Mac mini'.
严重
[准确性]
"Can Windows do it?"
理由: The translation fails to convey the original meaning. The source asks about running OpenClaw on Mac mini or Windows, not just 'having Mini Shrimp'. | Missed the entire content line '大家都用的什么?' (What is everyone using?). | Misses nuance that question is about running OpenClaw on Windows, not generic capability. |
| Qwen/Qwen3-14B | È necessario acquistare un Mini e un granchio? Si può usare Windows? |
严重
[准确性]
"Mini"
理由: The translation incorrectly translates 'Mini' as 'acquistare un Mini' instead of 'Mac mini'. | Misinterprets "Mini" which refers to Mac mini computer required for running OpenClaw, not a generic small device. | Failed to translate the slang term 'Mini' (referring to Mac mini) and left it untranslated, losing the specific hardware reference. | Wrong target language - hypothesis is in Italian but source is in Chinese and should be translated to English or another language. Also mistranslates '小龙虾' (OpenClaw/a software tool) as 'granchio' (crab) and 'Mini' is not explained.
轻微
[准确性]
"Si può usare Windows?"
理由: The translation misses the nuance of '可以吗' in the context of capability ('Can Windows do it?'), though 'Si può usare' is acceptable, the context implies running the specific tool mentioned. | Omitted nuance that question is about running that specific tool on Windows, not generic Windows usability. | Incomplete translation - misses the context that this is about whether Windows can run the software |
| Qwen/Qwen3.5-4B | Is there any need to make Mini and small crayfish? Can Windows be used? |
致命
[准确性]
"Mini and small crayfish"
理由: Completely misses the meaning. Should be 'Is it necessary to get a Mac mini to run OpenClaw?' The hypothesis fails to recognize the software context. | Misinterprets slang where “Mini” refers to Mac mini and “小龙虾” refers to OpenClaw software, not literal devices or animals. | Failed to translate the slang term '小龙虾' (OpenClaw) and 'Mini' (Mac mini), resulting in a nonsensical literal translation about cooking seafood. |
| Qwen/Qwen3.5-9B | Title: Is it necessary to get Mini and crayfish? Can Windows work? Content: What do everyone use? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Have you considered using Mini? Is it suitable for Windows? |
严重
[准确性]
"Mini"
理由: Misinterprets “小龙虾” (OpenClaw) as Mini; also drops Mac/computer implication from source/reference. | Failed to translate the slang term 'Mini' (referring to Mac mini) and '小龙虾' (OpenClaw), resulting in loss of specific meaning. | Hypothesis completely misses the context. Source mentions '小龙虾' (OpenClaw, a specific tool) and asks about necessity and Windows compatibility. Hypothesis translates as generic 'Mini' instead of the specific tool name, losing critical meaning. | Missing translation of '小龙虾' (OpenClaw) - 'Mini' is incorrect, should be 'Mac mini' as per reference. Also missing 'OpenClaw' entirely. |
| google/gemma-3-12b-it | Title: Is it necessary to make Mini and small lobsters? Can Windows do it? |
致命
[准确性]
"Mini and small lobsters"
理由: Misinterprets product names; should refer to Mac mini and OpenClaw, not literal mini and lobsters. | Failed to translate slang 'Mini' (Mac mini) and '小龙虾' (OpenClaw project), interpreting them literally as food items. | Mistranslation of '小龙虾' (xiaolongxia) - should be 'OpenClaw' (a software tool), not 'lobsters'. Also 'Mini' is unclear context. |
| google/gemma-3-1b-it | What do people use? |
致命
[准确性]
"What do people use?"
理由: Omits key information about Mini/Mac mini, OpenClaw (小龙虾), and Windows from both title and content, resulting in a severe under-translation. | The hypothesis completely omits the Title translation. It also fails to translate the slang terms 'Mini' (Mac mini) and '小龙虾' (OpenClaw), and misses the specific question about Windows compatibility. | Hypothesis only translates the content question, completely omitting the title which is the main part of the source. The title asks about necessity of getting Mini/Mac mini for running OpenClaw and Windows compatibility. | Missing 'Mac mini' and 'Windows' from the source. | Translation only covers Content, missing Title translation which asks about Mac mini and Windows compatibility for running OpenClaw |
| google/gemma-3-4b-it | Is it necessary to get Mini to handle small lobsters? Is Windows allowed? |
严重
[准确性]
"Mini"
理由: Failed to translate the slang 'Mini' (referring to Mac mini) and the product name '小龙虾' (OpenClaw), resulting in a nonsensical literal translation about handling small lobsters. | Mistranslation of '小龙虾' (xiaolong xia) - should be 'OpenClaw' (a software tool), not 'small lobsters' (literal translation). Also 'Mini' should be 'Mac mini'. | Refers to Mac mini computer for running OpenClaw, not some unspecified ‘Mini’ object
严重
[准确性]
"Is Windows allowed?"
理由: Incomplete translation. Should be 'Can Windows do it?' or 'Can Windows run it?' The reference shows this is asking about Windows compatibility. | The phrase 'windows 可以吗' asks about capability ('Can Windows do it?'), not permission ('Is Windows allowed?'). | Should ask if Windows can run it, not about being ‘allowed’ in a rules sense |
| google/translategemma-12b-it | Is it necessary to use Mini or small lobsters? Can Windows be used? |
致命
[准确性]
"Mini or small lobsters"
理由: Misinterprets slang/term for a specific tool/setup (e.g., Mac mini + OpenClaw) as literal lobsters, changing meaning. | Completely misses the meaning. Should be 'Is it necessary to get a Mac mini to run OpenClaw?' The source uses domain-specific terminology that was not properly decoded. | Failed to translate the slang 'Mini' (referring to Mac mini) and '小龙虾' (referring to OpenClaw software), interpreting them literally as crustaceans. |
| google/translategemma-4b-it | Is it necessary to use Mini to cook small lobsters? Can Windows be used? |
严重
[准确性]
"Mini to cook small lobsters"
理由: Should be 'run OpenClaw' | Failed to translate the slang term 'Mini' (referring to Mac mini) into the target context or retain the specific product name, treating it as a generic adjective. | Misinterprets technical slang 'Mini' (Mac mini) and '小龙虾' (OpenClaw) as cooking food instead of hardware/software context. |
| tencent/HY-MT1.5-1.8B | The question is whether it's necessary to handle mini lobsters in Windows? |
轻微
[准确性]
"The question is whether it's necessary to handle mini lobsters in Windows?"
理由: Complete mistranslation: '小龙虾' is slang for 'OpenClaw' (a software), not literal lobsters. 'Mini' refers to 'Mac mini'. The hypothesis completely misunderstood the source terminology. | Failed to translate the slang '小龙虾' (OpenClaw) and 'Mini' (Mac mini), interpreting them literally as crustaceans. | Misinterprets 'Mini' and '小龙虾' which refer to 'Mac mini' and 'OpenClaw', and omits mention of Mac; also ignores forum-question style about setup rather than handling lobsters. | Mistranslation of '小龙虾' (xiaolongxia) - in this context it refers to 'OpenClaw' (a software tool), not literal lobsters. Reference correctly identifies this as 'Mac mini' and 'OpenClaw'. |
| tencent/HY-MT1.5-7B | Is it necessary to use Mini for crayfish? Can Windows be used instead? |
严重
[准确性]
"Mini for crayfish"
理由: Mistranslation of '小龙虾' (xiǎo lóngxiā) - should be 'OpenClaw' (a software tool), not 'crayfish'. The source uses '小龙虾' as a codename/slang term for the software. | Misinterprets slang; should refer to Mac mini running OpenClaw, not literal crayfish. | Failed to translate the slang 'Mini' (referring to Mac mini) and the context of 'crayfish' (OpenClaw), resulting in a nonsensical literal translation. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 🎼 Today's Ultimate Solution for Longshous on GPT-5.4 |
严重
[准确性]
"Today's Ultimate Solution for Longshous on GPT-5.4"
理由: "小龙虾" should be OpenClaw, not transliterated as "Longshous"; title also omits idea of configuring/setting up in OpenClaw and focus on overly long outputs. | Mistranslation of the slang '小龙虾' (OpenClaw). 'Longshou' is a literal pinyin transliteration that is meaningless in English context; it should be translated as the community term 'OpenClaw'. | Content completely missing - reference includes full content about setting up GPT-5.4 in OpenClaw | Fails to translate '小龙虾' as 'OpenClaw'. Also misses the key context about 'overly long outputs' which is central to the title meaning. |
| CohereLabs/tiny-aya-water | Title: 🎼The Ultimate Guide to Long Output Solutions with GPT-5.4 and Little Crab |
轻微
[准确性]
"The Ultimate Guide to Long Output Solutions with GPT-5.4 and Little Crab"
理由: Incorrect translation of '小龙虾' as 'Little Crab' instead of 'OpenClaw' | Misinterprets as a guide; source is a personal status update about setting up GPT-5.4. Also mistranslates "小龙虾" (OpenClaw) as literal "Little Crab" and omits notion of "today" and experience. | Mistranslates '小龙虾' as 'Little Crab' instead of 'OpenClaw'. Also changes the meaning from 'Set up GPT-5.4 on OpenClaw' to 'with GPT-5.4 and Little Crab', losing the specific context. | Failed to translate the slang term '小龙虾' (OpenClaw) correctly. |
| Qwen/Qwen2.5-14B-Instruct | Title: 🎼Xiaolongxia Today Upgraded to Gpt5.4, Ultimate Solution for Long Outputs Content: Thanks to my friend today I got to use GPT5.4 in Xiaolongxia, it's really useful and smart, but... | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: 🎼Shrimp Prawn Today Upgraded to Gpt5.4, Ultimate Solution for Long Outputs Content: Thanks to my friend TuLan, I got to use GPT5.4 today. It’s really useful and smart, but... |
严重
[术语]
"Shrimp Prawn"
理由: Redundant and incorrect; source 小龙虾 refers to OpenClaw, not literal shrimp/prawn. | '小龙虾' is a slang for OpenClaw, not literal shrimp/prawn. Should be 'OpenClaw'. | Redundant translation of '小龙虾' as 'Shrimp Prawn' instead of 'GPT5.4'.
轻微
[准确性]
"Thanks to my friend TuLan"
理由: Misinterpreted '佬友' (bros/friends from the community) as a proper name 'TuLan'. | Should be 'Thanks to a bro'. '托佬友的福' means 'thanks to a friend/bro', not specifically 'TuLan'. | Name “托佬友” refers generically to a bro/friend, not a proper name ‘TuLan’. |
| Qwen/Qwen3-14B | 🎵 Oggi il Criceti ha aggiornato a Gpt5.4, la soluzione definitiva per output troppo lunghi |
轻微
[准确性]
"il Criceti"
理由: The translation incorrectly translates '小龙虾' as 'Criceti' instead of 'OpenClaw'. | Invents "Criceti" (hamsters), completely unrelated to "小龙虾" which refers to OpenClaw; core term mistranslated. | Wrong target language - hypothesis is in Italian. Also 'Criceti' (hamsters) is incorrect translation of '小龙虾' (OpenClaw) | Mistranslated the slang '小龙虾' as 'Criceti' (Hamsters), which is completely incorrect; it should be recognized as the tool name 'OpenClaw' or transliterated. |
| Qwen/Qwen3.5-4B | Title: 🎼 Little Crab Today Uploaded Gpt5.4, Ultimate Solution for Overly Long Outputs Content: Thanks to my friends today I was able to use GPT5.4, configured within Little Crab. It is truly useful and genuinely smart, though... |
严重
[术语]
"Little Crab"
理由: Should be 'OpenClaw' not translated as 'Little Crab' | Incorrect rendering of “小龙虾”, which in this community context refers to OpenClaw, not a literal crustacean and not “Little Crab”. | Incorrect translation of '小龙虾' as 'Little Crab' instead of 'OpenClaw'
严重
[准确性]
"Uploaded Gpt5.4"
理由: “上了GPT5.4” means set up / started using GPT‑5.4, not uploading it. | Misinterprets the meaning. Should be 'Set up GPT-5.4 on OpenClaw today'. The hypothesis incorrectly suggests the software uploaded something. | Should be 'set up' or 'configured' not 'uploaded' | '上了' in this context means 'set up' or 'deployed', not 'uploaded'.
致命
[准确性]
"within Little Crab"
理由: Should convey configuration within OpenClaw; current wording continues the incorrect literal animal reading. | Failed to translate the specific slang term '小龙虾' which refers to 'OpenClaw' in this context, translating it literally instead. | Should be 'OpenClaw', not 'Little Crab'. '小龙虾' is a slang term for the OpenClaw software tool. |
| Qwen/Qwen3.5-9B | Title: 🎼Crayfish today got on Gpt5.4, outputting the ultimate solution for too long Content: Today, thanks to my friend Lao Liu, I used GPT5.4 and configured it in Crayfish. It's truly useful and genuinely smart, but... |
严重
[术语]
"Crayfish"
理由: Should use the product name OpenClaw rather than literal crayfish. | Should be 'OpenClaw' not 'Crayfish' (slang term not translated) | Should be translated as 'OpenClaw' instead of 'Crayfish'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The title is '🎼 Small shrimp today climbed GPT5.4, output too long final solution'. The content is 'Today, the friend used GPT5.4 on behalf of the little dragon shrimp, and set it up nicely. It's really good to use and smart, but...' |
严重
[术语]
"Small shrimp / little dragon shrimp"
理由: Should be 'OpenClaw' instead of 'little dragon shrimp'. | Literalizes the nickname 小龙虾 instead of using the established product name OpenClaw as in the reference. | Literal translation of the slang '小龙虾' instead of the contextually correct 'OpenClaw'. |
| google/gemma-3-12b-it | Title: 🎼Lobster made it onto Gpt5.4 today, the ultimate solution for outputs that are too long Content: Thanks to a friend today, I was able to use GPT5.4 and set it up in Lobster. It’s really good, and it’s truly smart, but... |
严重
[术语]
"Lobster"
理由: Incorrect translation of '小龙虾' as 'Lobster' instead of 'OpenClaw' | Should be 'OpenClaw' not 'Lobster' - this is a project name/slang that should be preserved | Failed to identify '小龙虾' as the specific software project 'OpenClaw', translating it literally as the animal. |
| google/gemma-3-1b-it | Today, I got a good deal on GPT5.4, and the small shrimp was paired with it. It's really good, smart too. |
严重
[准确性]
"I got a good deal on GPT5.4"
理由: Misinterprets 托佬友的福 (‘thanks to a bro’s help/favor’) as getting a good deal; changes meaning. | Mistranslation of '托佬友的福用上了' (Thanks to a bro, I got to use). The source implies gratitude for access, not purchasing a deal. | Mistranslation of '托佬友的福' which means 'thanks to a bro' or 'thanks to a friend's help', not 'got a good deal'
严重
[术语]
"the small shrimp was paired with it"
理由: '小龙虾' here refers to the OpenClaw tool, not literal shrimp; incorrect technical/product term. | Failed to translate the slang '小龙虾' to its intended meaning 'OpenClaw', translating it literally instead. | Should be 'OpenClaw'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'small shrimp' |
| google/gemma-3-4b-it | "en": "Title: 🎼Little Lobster Went Live on GPT5.4, Outputting Extremely Long Ultimate Solutions\nContent: Thanks to a friend, I used GPT5.4 today, and configured it in Little Lobster. It’s really effective, truly intelligent, but…” , "zh-TW": "標題:🎼小龍蝦今天上了GPT5.4,輸出太長的終極解法\n內容:今天因為朋友的關係,用上了GPT5.4,在小龍蝦裡配好了,真的很好用,真的很有智慧,不過……”", "ja": "タイトル:🎼小龍蝦がGPT5.4でライブ配信、超長文の究極ソリューションを出力\nコンテンツ:友人の助けを借りてGPT5.4を使ってみた。小龍蝦に設定して使ってみたところ、本当に効果があり、本当に賢いのだが……”", "ko": "제목: 🎼작은 게가 GPT5.4에서 라이브 방송을 진행하고, 매우 긴 궁극적 솔루션을 출력했습니다.\n내용: 친구 덕분에 GPT5.4를 사용해 보았는데, 작은 게에 설정하여 사용해 본 결과, 정말 효과가 있고, 정말 똑똑하지만……”", "de": "Titel: 🎼Kleiner Hummer streamt auf GPT5.4 und gibt extrem lange ultimative Lösungen aus\nInhalt: Dank eines Freundes habe ich heute GPT5.4 verwendet und es in Kleiner Hummer eingerichtet. Es ist wirklich effektiv, wirklich intelligent, aber……”", "fr": "Titre : 🎼Le petit crabe est allé en direct sur GPT5.4, produisant des solutions ultimes extrêmement longues\nContenu : Grâce à un ami, j'ai utilisé GPT5.4 aujourd'hui et l'ai configuré dans Petit Crabe. C'est vraiment efficace, vraiment intelligent, mais……”", "it": "Titolo: 🎼Il piccolo gamberetto è andato in diretta su GPT5.4, producendo soluzioni ultimate estremamente lunghe\nContenuto: Grazie ad un amico, ho usato GPT5.4 oggi e l'ho configurato in Piccolo Gamberetto. È davvero efficace, davvero intelligente, ma……”" } <|im_end|> |
严重
[准确性]
"Went Live on GPT5.4"
理由: Mistranslated '上了' (installed/set up on) as 'Went Live' (broadcasting), missing the technical context. | Meaning is ‘set up GPT‑5.4 on OpenClaw’, not that OpenClaw ‘went live on’ GPT | Mistranslation of '小龙虾' - should be 'OpenClaw', not 'Little Lobster'. This is a software tool name, not a literal animal. |
| google/translategemma-12b-it | Title: 🎼Today, Xiaolongxia (Lobster) was updated to GPT-5.4, and here's the ultimate solution for outputs that are too long. |
严重
[准确性]
"was updated to GPT-5.4"
理由: Mistranslates '小龙虾' as 'Xiaolongxia (Lobster)'. Should be 'OpenClaw' - a specific tool name in this community context. | Missing Content field from source | Source means configured/connected with GPT-5.4 inside the app, not that Xiaolongxia itself was updated. |
| google/translategemma-4b-it | Title: 🎼Little Lobster used GPT5.4 today, outputting too long ultimate solutions Content: Thanks to a friend, I was able to use GPT5.4 today, and configured it for Little Lobster. It really works well, and is truly intelligent. However... |
致命
[准确性]
"Little Lobster"
理由: Misinterprets the slang '小龙虾' which refers to OpenClaw software, not an actual lobster or nickname. | Literal translation of the slang '小龙虾' (OpenClaw). The model failed to identify the entity as software. | Should be 'OpenClaw' |
| tencent/HY-MT1.5-1.8B | Translation of '小龙虾今天上了Gpt5.4,输出太长的终极解法' in en |
致命
[准确性]
"Translation of '小龙虾今天上了Gpt5.4,输出太长的终极解法' in en"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Not an actual translation; it is a meta-instruction. It also misses the app name 'OpenClaw' and meaning about configuring GPT-5.4 as solution for long outputs. | Hypothesis is incomplete and does not provide actual translation. Should translate to something like 'Set up GPT-5.4 on OpenClaw today, the ultimate solution for overly long outputs' per reference. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: 🎼Today, crayfish have been used with GPT5.4 to generate extremely long ultimate solutions.", "zh-TW": "标题:🎼今天,小龙虾与GPT5.5结合使用,生成了极其冗长的终极解决方案。", "ja": "タイトル:🎼今日、クライフッシュがGPT5.5と組み合わせて使用され、非常に長い究極の解決策が生成されました。", "ko": "제목: 🎼오늘, 미꾸리가 GPT5.5와 함께 사용되어 매우 긴 궁극적인 해결책이 생성되었습니다。」, "de":="Titel: 🎼Heute wurden Krabben mit GPT5.5 verwendet, um extrem lange endgültige Lösungen zu generieren.", "fr":="Titre : 🎼Aujourd'hui, les crevettes ont été utilisées avec GPT5.5 pour générer des solutions ultimes extrêmement longues.", "it":="Titolo: 🎼Oggi, i gamberetti sono stati utilizzati con GPT5.5 per generare soluzioni definitive estremamente lunghe."} |
轻微
[准确性]
"Today, crayfish have been used with GPT5.4"
理由: Mistranslation of '小龙虾' - should be 'OpenClaw', not 'crayfish'. Also the hypothesis provides multiple language translations when only English translation was requested. | The hypothesis contains a JSON object with multiple languages instead of a Chinese translation. It also mistranslates '小龙虾' as 'crayfish' and misses the meaning of '上了' (set up/updated). | Misreads 小龙虾 as literal crayfish instead of OpenClaw app name and omits notion of “set up on OpenClaw”. | Reference says GPT-5.4, but hypothesis incorrectly translates multiple versions (GPT5.5) and content is garbled with multiple languages mixed together
严重
[准确性]
"GPT5.5"
理由: Model version incorrectly changed from 5.4 to 5.5 in several languages. | Source says 'GPT5.4' but hypothesis translates as 'GPT5.5' in some language versions, creating inconsistency. | Hallucinated version number; source says GPT5.4. | Incorrect version number. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | I feel like it's profitable to install 'Lóngxiā' on others offline now. |
严重
[术语]
"'Lóngxiā'"
理由: Failed to localize the slang term '龙虾' to the established community term 'OpenClaw', using pinyin instead which breaks fluency and understanding. | '小龙虾' (OpenClaw) should not be transliterated - it's a known term for OpenClaw software | Should use established term OpenClaw instead of pinyin "Lóngxiā". | Should be 'OpenClaw'
严重
[准确性]
"on others offline"
理由: Missing reference to 'OpenClaw' - the software name should be used consistently | Fails to translate '龙虾' as 'OpenClaw'. Keeping the pinyin 'Lóngxiā' is not a proper translation and makes the text incomprehensible. | Misses nuance that the installation is done for other people’s machines "offline" as a paid service; overall meaning mostly preserved. |
| CohereLabs/tiny-aya-water | I feel like it's currently profitable to install 'Longshao' for others offline. |
轻微
[准确性]
"Longshao"
理由: Incorrect translation of '龙虾' as 'Longshao' instead of 'OpenClaw' | Incorrect transliteration/translation of the slang '龙虾' (OpenClaw); 'Longshao' is not the recognized term. | Incorrect transliteration; original slang refers to OpenClaw/龙虾, not "Longshao". | Mistranslates '龙虾' as 'Longshao' instead of 'OpenClaw'. The term should be consistently translated as the software tool name, not a phonetic transliteration. |
| Qwen/Qwen2.5-14B-Instruct | Title: It seems profitable to install lobster for others offline now Content: If not too lazy, this information gap is really profitable. I plan to let my lobster write a skill so that my lobster can install it on another computer once connected. |
严重
[准确性]
"install it on another computer"
理由: Does not explicitly convey automatic installation implied in source/reference. | Should be 'install OpenClaw for others offline' to match reference meaning | Literal translation of '龙虾' (OpenClaw) as 'lobster' makes the sentence nonsensical in a software context. |
| Qwen/Qwen2.5-7B-Instruct | Title: Feeling that installing LinuxDo now is quite profitable for others线下给别人装龙虾很赚钱啊 Content: If I’m not too lazy, this information gap money is really easy to make啊 我打算让我的LinuxDo write a skill, so my LinuxDo can install everything when connected to another computer以后我的LinuxDo只要连上另外一台电脑就能给那上面装好龙虾 |
严重
[术语]
"LinuxDo"
理由: '龙虾' (lobster) is slang for OpenClaw, not LinuxDo. The translation incorrectly uses 'LinuxDo'. | Inconsistent terminology; while 'LinuxDo' is an acceptable translation for the community, the context implies the software 'OpenClaw'. More critically, the translation mixes languages in the same sentence. | Incorrect translation of '龙虾' as 'LinuxDo' instead of 'OpenClaw'. | In this context “龙虾” means OpenClaw, not LinuxDo. |
| Qwen/Qwen3-14B | Mi sembra che installare Linux su altri computer sia molto lucrativo adesso |
致命
[准确性]
"installare Linux"
理由: Wrong target language - hypothesis is in Italian. Also mistranslates '龙虾' (OpenClaw) as 'Linux' which is completely inaccurate | Severe hallucination/mistranslation. The source '装龙虾' means 'install OpenClaw', but the hypothesis translates it as 'install Linux', changing the entire meaning of the technical task. | The translation incorrectly translates '龙虾' as 'Linux' instead of 'OpenClaw'. | 重大错误:"装龙虾"指安装 OpenClaw,而非 Linux 操作系统。完全改变主题。 |
| Qwen/Qwen3.5-4B | Title: I feel like installing lobster on others offline is very profitable now Content: If you're not too lazy, this information gap money is really easy to make. I plan for my lobster to write a skill, so that in the future, as long as my lobster connects to another computer, it can install lobster on it. |
严重
[术语]
"installing lobster"
理由: Incorrect translation of '龙虾' as 'lobster' instead of 'OpenClaw' | “龙虾” here denotes OpenClaw software; using “lobster” loses the established product name/technical term. | Should be 'OpenClaw' throughout, not 'lobster' |
| Qwen/Qwen3.5-9B | Title: It feels like installing LinuxDo on others offline is very profitable now Content: If you're not too lazy, this information gap money is really easy to make. I plan to write a skill for my LinuxDo, so that whenever my LinuxDo connects to another computer, it can install LinuxDo on that machine. |
严重
[术语]
"LinuxDo"
理由: Source refers to 龙虾 which in this context is OpenClaw, not LinuxDo. | Should be translated as 'OpenClaw' instead of 'LinuxDo'. | Should be 'OpenClaw' not 'LinuxDo' (wrong terminology throughout)
致命
[准确性]
"installing LinuxDo"
理由: Should be 'installing OpenClaw for others offline' - completely wrong term | Should be 'installing OpenClaw'. '龙虾' is slang for OpenClaw, not LinuxDo. | Incorrect entity translation. In this context, '龙虾' refers to the tool 'OpenClaw', not the forum name 'LinuxDo'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | The feeling is that online you can make a lot of money by selling 'dragon shrimp' to others. |
严重
[准确性]
"online"
理由: Should specify 'installing OpenClaw for others offline' instead of general 'selling 'dragon shrimp''. | Source says 线下 (offline) but translation reverses to online, changing the meaning. | Hypothesis only translates title and changes 'offline' to 'online', which is opposite meaning. Also uses literal 'dragon shrimp' instead of 'OpenClaw'. Missing entire content section. | Mistranslated '线下' (offline/in-person) as 'online'.
严重
[准确性]
"selling 'dragon shrimp'"
理由: Should be 'installing OpenClaw' instead of 'selling 'dragon shrimp''. | Source is about helping others install OpenClaw (装龙虾), not selling something called ‘dragon shrimp’. | Misinterpreted '装龙虾' (installing OpenClaw) as 'selling dragon shrimp'. |
| google/gemma-3-12b-it | Title: Feels like installing lobster on others offline is really profitable now Content: If you're not lazy, this information gap money is really easy to earn. I plan to have my lobster write a skill so that my lobster can install lobster on another computer as soon as it connects. |
严重
[准确性]
"installing lobster"
理由: The term refers to installing OpenClaw; literal ‘lobster’ is incorrect product name translation. | '龙虾' should be 'OpenClaw', not 'lobster' | Should be 'installing OpenClaw for others' not 'installing lobster on others' - grammar and terminology issue
严重
[术语]
"installing lobster"
理由: Incorrect translation of '龙虾' as 'lobster' instead of 'OpenClaw' | Should be 'OpenClaw' not 'lobster' - this is a project name/slang that should be preserved | Repeated literal translation of '龙虾' (OpenClaw) as the animal 'lobster' instead of the software name. |
| google/gemma-3-1b-it | Feeling like this online business is really profitable, I plan to make my shrimp skills and then they will be able to connect to another computer. |
严重
[准确性]
"Feeling like this online business is really profitable"
理由: Mistranslation of '线下给别人装龙虾' which means 'installing OpenClaw for others offline', not 'online business' | Incorrect translation - '线下' means 'offline' not 'online' | Mistranslation of '线下' (offline) as 'online'. | ‘线下给别人装龙虾’ means installing OpenClaw for others offline, not an online business; changes modality and context.
严重
[术语]
"I plan to make my shrimp skills"
理由: Failed to translate the slang '龙虾' to 'OpenClaw'. | Should be 'OpenClaw skills'. | Incorrect translation of '龙虾' - should be 'OpenClaw' not 'shrimp' | '龙虾' is OpenClaw, not shrimp; also ‘写个skill’ is to write a skill/plugin using OpenClaw, not ‘make my shrimp skills’.
严重
[准确性]
"then they will be able to connect to another computer."
理由: Omits key idea that connecting will allow automatic installation of OpenClaw on the other computer; result is incomplete. | Omission of the consequence '给那上面装好龙虾' (install OpenClaw on it). | Incomplete translation - missing the part about 'installing OpenClaw on it' | Incomplete and inaccurate - should be 'it can automatically install OpenClaw on it' (referring to the other computer) |
| google/gemma-3-4b-it | It feels like setting up lobsters for others offline is really profitable now. |
致命
[准确性]
"setting up lobsters"
理由: Mistranslation of '龙虾' - should be 'OpenClaw', not 'lobsters'. Also incomplete - missing the full context about installing the software. | Failed to translate the slang '装龙虾' (installing OpenClaw), interpreting it literally as installing actual lobsters. | ‘装龙虾’ is installing OpenClaw software, not literal lobsters |
| google/translategemma-12b-it | Title: It feels like installing "longxia" for others offline is very profitable now. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: I feel like setting up lobsters for others offline is really profitable now. |
致命
[准确性]
"setting up lobsters"
理由: Literal translation of '装龙虾' (install OpenClaw). Misidentifies the software as the animal. | '龙虾' refers to OpenClaw software; translation as literal lobsters changes meaning. | Should be 'OpenClaw' |
| tencent/HY-MT1.5-1.8B | Translation of '感觉现在线下给别人装龙虾很赚钱啊' in en |
致命
[准确性]
"Translation of '感觉现在线下给别人装龙虾很赚钱啊' in en"
理由: No translation is provided, only a meta description. The intended meaning about offline installation of OpenClaw being profitable is missing. | Hypothesis does not provide actual translation. Should be 'I feel like installing OpenClaw for others offline is highly profitable right now' per reference. | The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: It seems quite profitable to install ‘Linux’ for others offline nowadays.", "zh-TW": "Title: 看起来现在线下为他人安装‘Linux’确实很赚钱。", "ja": "Title: 今ではオフラインで他の人に“Linux”をインストールすることがかなり儲かるようだね。", "ko": "Title: 지금은 오프라인에서 다른 사람들에게 ‘Linux’를 설치해 주는 것이 꽤 수익성이 높아 보입니다.", "de":="Titel: Heute scheint es durchaus profitabel zu sein, anderen Menschen offline ‘Linux’ zu installieren.","fr":"Titre: Il semble que l’installation de ‘Linux’ pour les autres hors ligne soit plutôt lucrative ces jours-ci.","it":"Titolo: Oggi sembra davvero redditizio installare ‘Linux’ per gli altri in modalità offline."}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is there any public proxy available today? |
严重
[准确性]
"public proxy"
理由: Misses critical context that this is specifically about 'Claude' and 'free API proxy sites', not generic proxies. The translation is too vague. | Content missing - reference includes 'Do you bros have any other sites that are still working?' | '公益站' specifically refers to 'free API proxy sites' in this context. 'Public proxy' is slightly broader and misses the 'free/community-provided' nuance. | Source is specifically about Claude free/public proxy sites; hypothesis omits "Claude" and API/site aspect, changing meaning scope. |
| CohereLabs/tiny-aya-water | Title: Is there any public proxy available today? |
严重
[准确性]
"public proxy"
理由: Missing context 'for Claude' in the translation | Loses critical context about 'Claude' and 'free API proxy sites'. The translation is too vague and doesn't capture the specific service being asked about. | Lost specific reference to 'Claude' and 'free API proxy sites' (公益站) present in the source. | Source explicitly refers to "Claude" free/public sites; target omits Claude and generalizes to any public proxy. |
| Qwen/Qwen2.5-14B-Instruct | Title: Is there no free shared API site available today without Claude? Content: Everything is down today, old friends, do you have any other sites that can be used? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Is the free shared API site unavailable today without Claude? |
严重
[准确性]
"Is the free shared API site unavailable today without Claude?"
理由: The translation omits the Content entirely. The source has Content: '今天全炸了,老友们还有什么站能用呀' which should be translated. | Misinterprets meaning; original asks whether there are any Claude free public sites available today, not a “site without Claude”. Also omits crash context and question to others. | Missing translation of the content. | Misinterpreted the sentence structure. The source asks if there are no sites available *for* Claude, not if sites are unavailable *because* Claude is missing. | Should be 'Are there no free API proxy sites for Claude available today?' The hypothesis misinterprets the question structure. |
| Qwen/Qwen3-14B | Non c'è più un sito API gratuito disponibile oggi senza Claude? |
轻微
[准确性]
"sito API gratuito"
理由: Wrong target language - hypothesis is in Italian instead of English. Also the phrasing is awkward and doesn't match the source meaning | Should specify "free API proxy sites for Claude"; "sito API gratuito" is vague and omits "proxy" nuance. | The translation incorrectly translates '公益站' as 'sito API gratuito' instead of 'free API proxy site'. | Complete distortion of meaning. The source asks if there are no free sites *for* Claude available. The hypothesis asks if there are no free sites available *without* Claude, inverting the logic. |
| Qwen/Qwen3.5-4B | Title: Is today's free shared API site down? Content: Everything is down today, friends, what sites can you still use? |
严重
[准确性]
"free shared API site"
理由: The source specifically asks about 'Claude' free sites. The hypothesis generalizes this, losing specific information. | Source refers to public Claude proxy sites generally, not necessarily ‘shared API’ specifically; nuance is slightly shifted but still understandable. | Should be 'Are there no free API proxy sites for Claude available today?' The hypothesis misses the specific reference to Claude. |
| Qwen/Qwen3.5-9B | Is there no free Claude public proxy available today? All sites are down; what other sites can you use? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Is there a free API available today without Claude? |
严重
[准确性]
"Title: Is there a free API available today without Claude?"
理由: Misinterprets source. Should be 'Are there no free API proxy sites for Claude available today?' The hypothesis reverses the meaning by saying 'without Claude' instead of 'for Claude'. | “没有Claude的公益站可用了吗” refers to free proxy sites for Claude being unavailable, not “free API without Claude” which changes the meaning. | Completely inverted the meaning. Source asks if Claude free sites are unavailable; hypothesis asks for APIs *without* Claude. | Incorrect translation - '公益站' means 'free API proxy site' not 'free API'. Missing 'proxy site' concept. Also '没有Claude的' means 'no Claude proxy sites' not 'without Claude'. |
| google/gemma-3-12b-it | Title: Are free API sites available for Claude today? Content: Everything is down today, friends, what other sites can we use? |
严重
[准确性]
"free API sites"
理由: Source specifically mentions free API proxy sites for Claude; ‘proxy’ is omitted, changing the technical meaning. | Should specify 'free API proxy sites' not just 'free API sites'. Also missing 'no' from the question structure. | Missed the specific nuance of '公益站' which implies 'free/proxy/community-run' sites, often translated as 'free API proxy sites' in this context. |
| google/gemma-3-1b-it | Is there any free shared API site available today? | 通过 (无共识错误) |
| google/gemma-3-4b-it | Is today's Claude public proxy unavailable? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Are there any Claude public API sites available today? Content: Everything is down today, friends. Are there any other sites that are working? |
轻微
[术语]
"Claude public API sites"
理由: '公益站' in this context refers to free/proxy sites provided by the community, 'public API sites' is slightly off but understandable. 'Free API proxy sites' (reference) is more accurate. | Reference is specifically about free API proxy sites, not generic public API sites; nuance is lost. | Should be 'free API proxy sites' or 'free proxy sites' as per reference - '公益站' means free proxy site, not public API |
| google/translategemma-4b-it | Is there any free shared API site available today? |
严重
[准确性]
"free shared API site"
理由: Should be 'API proxy sites for Claude' | Omission of specific entity 'Claude' mentioned in the source ('Claude 的公益站'). Generalized the query too much. | Source refers specifically to Claude free proxy sites; hypothesis omits 'Claude' and changes nuance from proxy sites to generic shared API sites. |
| tencent/HY-MT1.5-1.8B | Translation of '今天没有Claude的公益站可用了吗' in en |
致命
[准确性]
"Translation of '今天没有Claude的公益站可用了吗' in en"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Merely restates a request for translation instead of translating; omits meaning about free Claude proxy sites being unavailable today. | Hypothesis does not provide actual translation. Should be 'Are there no free API proxy sites for Claude available today?' per reference. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | Today, there is no Claude’s Public Welfare Site available? |
严重
[准确性]
"Claude’s Public Welfare Site"
理由: Literal translation of '公益站'. In this context, it refers to 'free API proxy sites' or 'free nodes', not a charitable organization. | Imprecise translation - should be 'free API proxy sites for Claude' rather than 'Claude's Public Welfare Site'. The reference clarifies this is about proxy services. | Misinterprets 公益站; should be free API proxy sites for Claude, not a literal public welfare site. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: How does GPT solve shadowbanning? Do you have any good methods? |
轻微
[术语]
"How does GPT solve shadowbanning? Do you have any good methods?"
理由: Uses 'shadowbanning' but reference uses 'downgrade/shadowban'. The term '降智' more accurately means 'downgrade' in this context rather than 'shadowbanning'. | Should be 'downgrade/shadowban' | '降智' (downgrade/shadowban) translated as 'shadowbanning' - acceptable but slightly different nuance |
| CohereLabs/tiny-aya-water | Title: How does GPT solve shadowbanning, guys have good methods? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: How does GPT address being nerfed, do you have any good methods? Content: Create a group chat, @ing it replies normally, but only within the group chat. Also, it's normal on mobile devices, is there a way to make it work properly on web as well? |
轻微
[术语]
"being nerfed"
理由: Conveys downgrade idea but misses specific connotation of shadowban/downgrade in this community jargon. | While 'nerfed' is close, 'shadowban' or 'downgrade' (as in reference) might be more precise for '降智' in this specific LLM context, though 'nerfed' is acceptable. | Incorrect translation of '降智' - should be 'shadowbanned/downgraded' to match reference terminology |
| Qwen/Qwen2.5-7B-Instruct | Title: How does GPT solve dumbing down, guys have good methods? |
严重
[准确性]
"How does GPT solve dumbing down, guys have good methods?"
理由: Should be 'How to resolve the GPT downgrade/shadowban?' The hypothesis misses the content entirely and only provides a partial title translation. | The translation omits the Content entirely. The source has important Content about creating group chats and mobile app working but web not working. | Misinterpreted the intent. The source asks *how to resolve* the issue of 'dumbing down' (降智), not how GPT solves it. Also omitted the content section entirely.
严重
[术语]
"dumbing down"
理由: Translation of '降智' as 'dumbing down' instead of 'shadowban'. | '降智' is a slang term meaning 'shadowban' or 'downgrade', not 'dumbing down'. Should be 'shadowban' or 'downgrade'. | Should be 'downgrade/shadowban' or 'intelligence degradation'. '降智' refers to a specific phenomenon in AI services, not literal 'dumbing down'. |
| Qwen/Qwen3-14B | Come risolve GPT il depotenziamento? Qualcuno ha buoni metodi? |
严重
[准确性]
"Come risolve GPT il depotenziamento?"
理由: The translation incorrectly translates '降智' as 'depotenziamento' instead of 'shadowban/downgrade'. | Wrong target language - hypothesis is in Italian. Should be in English. Also incomplete - missing the content section entirely | Loses nuance that user is asking "how to resolve" a downgrade issue affecting GPT usage, not how GPT itself solves it. |
| Qwen/Qwen3.5-4B | Title: How does GPT get shadowbanned? Do you guys have any good methods? Content: Create a group chat, @ it and reply normally, but only within the group chat. The mobile app also works fine. Is there any way to make the web page work normally too? |
严重
[准确性]
"How does GPT get shadowbanned?"
理由: Should be 'How to resolve the GPT downgrade/shadowban?' The hypothesis completely reverses the meaning - it's asking for solutions, not how to cause the problem. | The source asks 'How to resolve' (如何解决), but the hypothesis asks 'How does it get' (cause vs solution). This changes the intent of the question. | Original asks how to resolve/handle downgrade or shadowban, not how GPT gets shadowbanned. |
| Qwen/Qwen3.5-9B | Title: How does GPT solve the nerfing? Do you have any good methods, folks? Content: Create a group chat, @reply is normal, but only within the group chat. Also, mobile phones work normally. Is there any way for you guys to make the web version work normally too? |
轻微
[术语]
"nerfing"
理由: Should be 'the downgrade/shadowban' not 'the nerfing' (gaming slang not appropriate) | '降智' implies a reduction in intelligence or shadowban. 'Nerfing' is acceptable gaming slang but 'downgrade' or 'shadowban' might be more precise for LLMs. | Should be translated as 'shadowban/downgrade' instead of 'nerfing'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: How does GPT solve dumbness? Do you guys have good methods? |
严重
[准确性]
"How does GPT solve dumbness?"
理由: Mistranslates '降智' as 'dumbness'. Should be 'downgrade/shadowban' referring to API degradation, not intelligence. Also missing content section entirely. | Incorrect translation of '降智' - should be 'shadowban/downgrade' not 'dumbness'. This is a technical term in the community. | Misrenders “GPT如何解决降智” which is about resolving a downgrade/shadowban state, not GPT solving stupidity generally. |
| google/gemma-3-12b-it | Title: How does GPT address being nerfed? Any good methods, folks? Content: Create a group chat, @ it and the response is normal, but only within the group chat. Also, it works normally on mobile. Do you guys have any ways to make it work normally on the web? |
轻微
[术语]
"How does GPT address being nerfed?"
理由: Should be 'shadowban/downgrade' or 'downgrade' rather than 'nerfed' for technical accuracy | Should be 'shadowbanned' or 'downgraded' - 'nerfed' is gaming slang but the source uses '降智' which means intelligence reduction/shadowban in this context | Incorrect translation of '降智' as 'nerfed' instead of 'shadowban/downgrade' | '降智' is well translated as 'nerfed' in gaming/AI context, but 'downgrade' or 'shadowban' (as in reference) might be more precise depending on specific community usage. 'Nerfed' is acceptable. |
| google/gemma-3-1b-it | How can GPT solve '降智' (AI model shadowbanning), and what methods do we have? | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: How to Solve ‘Shadowbanning’ for GPT, do you guys have any good methods? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: How does GPT address being nerfed? Are there any good methods, folks? Content: Creating a group chat and @ing it results in a normal response, but only within the group chat. Also, the mobile version is working normally. Do you have any ways to make the web version work properly? |
轻微
[术语]
"being nerfed"
理由: '降智' literally means 'intelligence reduction'. 'Nerfed' is an acceptable gaming slang equivalent, but 'downgrade' or 'shadowban' (as in reference) might be more precise depending on context. Acceptable. | Translates '降智' as 'being nerfed' but reference uses 'downgrade/shadowban' which is more accurate in this context. | Original refers to downgrade/shadowban of model behavior; “nerfed” is looser gaming slang and less precise. |
| google/translategemma-4b-it | Title: How to prevent 'shadowbanning' of GPT models, do you have any good methods? |
严重
[准确性]
"How to prevent 'shadowbanning' of GPT models"
理由: Incomplete translation; the Content section describing the specific issue (group chat vs web version) is entirely missing. | Drops explicit mention of downgrade; scope broadened from GPT-5.4/降智 issue to generic 'GPT models' and prevention rather than resolution. | Should be 'resolve the GPT downgrade/shadowban' |
| tencent/HY-MT1.5-1.8B | en": "To translate the title 'GPT如何解决降智,佬们有好的方法吗?' into English", "zh-TW": "如何将标题‘GPT如何解决降智,佬们有好的方法吗?’翻译成英文", "ja": "GPT如何解決降智問題?佬們有良方嗎?", "ko": "GPT는 어떻게 낮은 지능을 해결할 수 있을까? 러들이 좋은 방법이 있을까?", "de": "如何將標題“GPT如何解決降智問題?佬們有良方嗎?”翻譯成德語", "fr": "GPT如何處理降智問題?佬們有好辦法嗎?", "it": "GPT如何解決降智問題?佬們有良策嗎?"} | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | How does GPT address the issue of '降智'? Does anyone have any good solutions? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: AI White Elephant's Question about Dragon Crab |
轻微
[准确性]
"AI White Elephant's Question"
理由: Completely mistranslates '小白' (newbie/beginner) as 'White Elephant'. Also fails to translate '龙虾' as 'OpenClaw'. | Content completely missing - reference includes detailed content about AI agents and differences with Claude Code | "小白" means newbie/beginner, not "White Elephant". | Mistranslation of '小白' (newbie/beginner) as 'White Elephant' (useless burden), completely changing the meaning and tone.
严重
[术语]
"Dragon Crab"
理由: '小龙虾' (OpenClaw) incorrectly translated as 'Dragon Crab' - completely wrong term | "龙虾" in this context is OpenClaw; "Dragon Crab" is incorrect and confusing. | Should be 'OpenClaw' |
| CohereLabs/tiny-aya-water | Title: AI Whiteboard about Dragon Crab Questions |
致命
[准确性]
"AI Whiteboard"
理由: Completely mistranslates "AI小白" which means "AI newbie"; "Whiteboard" is wrong. | Mistranslated '小白' (newbie/beginner) as 'Whiteboard'. | Mistranslates '小白' (newbie/beginner) as 'Whiteboard'. Also mistranslates '龙虾' as 'Dragon Crab' instead of 'OpenClaw'.
严重
[准确性]
"Dragon Crab"
理由: Incorrect translation of '龙虾' as 'Dragon Crab' instead of 'OpenClaw' | Failed to translate the slang '龙虾' (OpenClaw) correctly. | Literal mistranslation of slang "龙虾" which should refer to OpenClaw software. |
| Qwen/Qwen2.5-14B-Instruct | Title: Newbie AI Question About Lobster Content: Recently, I have been working on an intelligent agent task and my teacher mentioned something about 'lobster's approach to intelligence'. After searching and trying it out, I found it somewhat confusing. Could someone explain what lobster is used for and how it differs from Claude Code or other web-based large models? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: AI Novice's Questions about Lobster\nContent: Recently, there was a need for an agent task, and my teacher mentioned 'lobster's intelligent approach'. I searched for more information and tried it out, but found it somewhat confusing. I would like to ask what exactly lobster is used for, and how it differs from Claude Code or other web-based large models.", "zh-TW": "標題:AI初學者關於龍蝦的問題\n內容:最近有個代理任務的需求,老師提到「龍蝦的智慧化思維」,我查了一些資料並試用過,感覺有點困惑。我想問一問龍蝦到底用來做什麼,跟Claude Code或其他網頁版大型模型有什麼不同?", "ja": "タイトル:AI初心者がロブスターについての質問\n内容:最近、エージェントタスクが必要になり、先生が「ロブスターの高度化アイデア」と言いました。調べてみた後で試してみましたが、少し混乱しました。ロブスターは具体的に何のために使われるのか、クラウドコードやその他のウェブベースの大規模モデルとどのように異なるのかを教えていただけますか?", "ko": "제목:AI 초보자들이 게살에 대한 질문\n내용:최근 에이전트 작업이 필요했고 선생님이 “게살의 지능화 아이디어”라고 말했습니다. 정보를 찾아보고 시도해 보았지만 약간 혼란스러웠습니다. 게살이 실제로 무엇을 위해 사용되는지 궁금하고 클로우드 코드나 다른 웹 기반 대형 모델과 어떻게 다름인지 알려주실 수 있을까요?", "de": "Titel: AI-Anfänger-Fragen zu Krabben\nInhalt: Letzteres war eine Agentenaufgabe und der Lehrer sagte „Intelligenzerkenntnis von Krabbenspezies“. Ich habe mehr darüber recherchiert und es ausprobiert, aber ich fand es etwas verwirrend. Ich möchte wissen, für welchen Zweck Krabbe tatsächlich verwendet wird und wie sie sich von Claude Code oder anderen Web-basierten großen Modellen unterscheidet.", "fr": "Titre : Questions d'apprenant en IA sur la langouste\nContenu : J'ai récemment eu besoin d'une tâche d'agent et mon professeur a mentionné « l'intelligence artificielle de la langouste ». J'ai cherché plus d'informations et essayé, mais cela m'a paru un peu confus. Je voudrais savoir à quoi exactement la langouste est utilisée, et comment elle diffère de Claude Code ou d'autres modèles de grande taille basés sur le web ?", "it": "Titolo: Domande di principianti su gamberetti per l'IA\nContenuto: Recentemente c'era una necessità di compiti agente e il professore ha menzionato "l'approccio intelligente del gamberetto". Ho cercato informazioni e ho provato, ma mi è sembrato un po' confuso. Vorrei sapere a cosa esattamente viene utilizzato il gamberetto, e come differisce da Claude Code o altri modelli web grandi." } |
严重
[术语]
"lobster"
理由: '龙虾' is slang for OpenClaw, not literal lobster. Should be 'OpenClaw' throughout. | Failed to recognize '龙虾' as the slang term for 'OpenClaw', translating it literally as the animal. | Should use established product name (OpenClaw) for 龙虾 in this context, not literal ‘lobster’. |
| Qwen/Qwen3-14B | Domande di un principiante su 'Lobster' |
轻微
[准确性]
"Domande di un principiante su 'Lobster'"
理由: The translation incorrectly translates '龙虾' as 'Lobster' instead of 'OpenClaw'. | Only translates the title; entire detailed content about agent task, teacher’s comment, confusion, and comparison with Claude Code/web LLMs is missing. | Wrong target language - hypothesis is in Italian. Also incomplete - only translates title, missing content. 'Lobster' is incorrect for '龙虾' (OpenClaw) | Translated the slang '龙虾' literally as 'Lobster' instead of recognizing it as the proper noun 'OpenClaw' used in the community. |
| Qwen/Qwen3.5-4B | Title: AI Beginner's Question about Lobster Content: Recently had a need for agent tasks, and my teacher mentioned "Lobster's intelligent approach." I searched for it and tried it out, but found it somewhat confusing. Could you clarify what Lobster is specifically used for, and how does it differ from Claude Code or other web-based large language models? |
严重
[术语]
"Lobster"
理由: Community term “龙虾” refers to OpenClaw; using the literal animal name loses the specific tool reference. | Should be 'OpenClaw' not 'Lobster' throughout | Incorrect translation of '龙虾' as 'Lobster' instead of 'OpenClaw' |
| Qwen/Qwen3.5-9B | "en": "Title: Questions from an AI beginner about LinuxDo\nContent: Recently had a need for intelligent agent tasks, and my teacher mentioned 'the intelligence approach of LinuxDo', so I searched for it and tried it out, but feel a bit confused using it. Want to ask what exactly LinuxDo is used for, and how does it differ from Claude Code or other web-based large models?", "zh-TW": "標題:AI 新手關於 LinuxDo 的提問\n內容:最近有做智能體任務的需求,然後老師說了一句「LinuxDo 的智能化思路」,我就去搜尋了一下,也去試用了一下,但感覺用著有點困惑。想請問一下 LinuxDo 具體是用來做什麼的,跟 Claude Code 或其他網頁版大模型有什麼區別嗎?", "ja": "タイトル:AI の初心者による LinuxDo に関する質問\n内容:最近、エージェントタスクの需要がありましたが、先生が「LinuxDo のインテリジェンスアプローチ」とおっしゃったので検索し、試してみましたが、少し混乱しています。LinuxDo が具体的に何に使われるのか、Claude Code や他のウェブベースの大規模モデルとの違いは何でしょうか?", "ko": "제목: AI 초보자가 LinuxDo 에 대해 묻는 질문\n내용: 최근 에이전트 작업이 필요했는데, 선생님께서 \"LinuxDo 의 지능화 접근 방식\"이라고 말씀하시자 검색해보고 사용해봤지만 조금 혼란스러웠습니다. LinuxDo 가 구체적으로 무엇을 위해 사용되는지, Claude Code 나 다른 웹 기반 대형 모델과 어떤 차이가 있는지 궁금합니다.", "de": "Titel: Fragen eines KI-Neulings über LinuxDo\nInhalt: Kürzlich gab es Anforderungen für intelligente Agentenaufgaben, und mein Lehrer erwähnte den „Intelligenzansatz von LinuxDo". Ich habe danach gesucht und ausprobiert, fühle mich aber etwas verwirrt dabei. Ich möchte wissen, wofür LinuxDo genau verwendet wird und wie sich das von Claude Code oder anderen webbasierten großen Modellen unterscheidet.", "fr": "Titre : Questions d'un débutant en IA sur LinuxDo\nContenu : Récemment, j'avais besoin de réaliser des tâches d'agents intelligents, et mon professeur a mentionné « l'approche d'intelligence de LinuxDo ». J'ai donc cherché et essayé, mais je me sens un peu perdu. Je voudrais savoir à quoi sert exactement LinuxDo et comment il diffère de Claude Code ou d'autres grands modèles basés sur le web ?", "it": "Titolo: Domande di un principiante dell'IA su LinuxDo\nContenuto: Di recente avevo bisogno di eseguire compiti per agenti intelligenti e il mio professore ha menzionato « l'approccio all'intelligenza di LinuxDo ». Ho cercato e provato, ma mi sento un po' confuso. Vorrei sapere esattamente a cosa serve LinuxDo e come si distingue da Claude Code o altri grandi modelli basati sul web?" } |
严重
[术语]
"LinuxDo"
理由: 龙虾 here refers to OpenClaw; using LinuxDo is incorrect term mapping for this context. | Should be 'OpenClaw' not 'LinuxDo' throughout (wrong terminology) | Should be translated as 'OpenClaw' instead of 'LinuxDo'.
致命
[准确性]
"LinuxDo"
理由: Incorrect entity translation. '龙虾' here refers to the tool 'OpenClaw', not the platform 'LinuxDo'. | Should be 'OpenClaw' throughout. '龙虾' is slang for OpenClaw, not LinuxDo. | Should be 'the intelligent mindset of OpenClaw' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: AI White Paper on Shrimp's Questions |
致命
[准确性]
"AI White Paper on Shrimp's Questions"
理由: Mistranslated '小白' (newbie/beginner) as 'White Paper'. | Completely incorrect translation. '小白' means 'newbie/beginner', not 'white paper'. Should be 'A question about OpenClaw from an AI newbie'. Also uses literal 'shrimp' instead of 'OpenClaw'. | 严重误译。“AI小白”是AI新手,“关于龙虾的提问”是关于OpenClaw的问题;被译成“白皮书”和“虾的问题”,完全改变含义并丢失产品名 OpenClaw。 |
| google/gemma-3-12b-it | Title: AI Beginner's Questions about Lobster Content: Recently, there have been requirements for agent tasks, and the teacher said something like "the intelligentization idea of lobster." I searched for it and tried using it, but I feel a bit confused. I would like to ask what lobster is specifically used for, and what is the difference between it and Claude Code or other web-based large language models? |
严重
[准确性]
"Lobster"
理由: Product name OpenClaw is mistranslated literally as ‘lobster’, losing specific reference to the tool. | Should be 'the intelligent mindset of OpenClaw' - terminology and grammar issues | '龙虾' should be 'OpenClaw', not 'Lobster'
严重
[术语]
"Lobster"
理由: Should be 'OpenClaw' not 'Lobster' - this is a project name/slang that should be preserved | Failed to translate the proper noun '龙虾' (OpenClaw), using the literal animal name. | Incorrect translation of '龙虾' as 'Lobster' instead of 'OpenClaw' |
| google/gemma-3-1b-it | Translation of 'AI小白关于龙虾的提问' in en |
致命
[准确性]
"Translation of 'AI小白关于龙虾的提问' in en"
理由: The hypothesis is a meta-commentary instruction rather than a translation. It fails to translate the Title or the Content. | Not a translation but a meta-description; completely fails to translate title and ignores body content entirely. | Hypothesis is not a translation but a meta-description. Should provide actual English translation of the title and content. | Missing content translation. | Not a translation - just describes that it's a translation request without providing actual translation |
| google/gemma-3-4b-it | Lobster Lobster is a delicious seafood and also a nutritious food. It is rich in protein, vitamins, and minerals, which are very beneficial to human health. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: A question about "Lobster" from an AI novice Content: Recently, I have some requirements for agent tasks, and the teacher said something like "the intelligentization idea of Lobster". I searched for it and tried it out, but I feel a bit confused when using it. I would like to ask what Lobster is specifically used for, and what is the difference between it and Claude Code or other web-based large models? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: AI Beginner Questions about Lobsters |
致命
[准确性]
"Lobsters"
理由: Literal translation of '龙虾' (OpenClaw). Fails to recognize the software entity. | Should be 'OpenClaw' | '龙虾' is OpenClaw software; literal translation to lobsters loses intended technical meaning. |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾的智能化思路' in en |
致命
[准确性]
"Translation of '龙虾的智能化思路' in en"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Hypothesis does not provide actual translation. Should be 'the intelligent mindset of OpenClaw' per reference. | No actual translation, only a meta phrase. Should express 'the intelligent mindset/approach of OpenClaw' or similar. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: Questions from an AI Beginner about Lobsters", "zh-TW": "标题:AI新手关于龙虾的疑问", "ja": "标题:AI初心者によるロブスターに関する質問", "ko": "제목: AI 초보자가 로브스터에 대해 묻는 질문들", "de":="Titel: Fragen eines AI-Anfängers über Robben", "fr":="Titre: Des questions d’un débutant en IA sur les homards", "it":="Titolo: Domande di un principiante dell’IA sui gamberi"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Are you using models with configurations from Kongming Station? Content: Which station is more suitable for daily use? If not satisfied with Kongming Station's model configuration, are there any recommended stable APIs? |
严重
[准确性]
"models with configurations from Kongming Station"
理由: Fails to identify that '公益站' means 'free API proxy sites', not 'Kongming Station'. Also misses that '大龙虾' refers to 'OpenClaw'. | Content missing - reference includes detailed content about site recommendations and API suggestions | Mistranslation of '公益站' (public welfare/free sites). 'Kongming' is a hallucination or severe error unrelated to the source meaning 'free/public service', whereas the reference correctly identifies them as 'free API proxy sites'. | "公益站" means free API proxy sites, not "Kongming Station"; also misses that models are used to configure OpenClaw. |
| CohereLabs/tiny-aya-water | Title: Are there any free API sites that provide proxy services for developers? |
严重
[准确性]
"free API sites that provide proxy services for developers"
理由: Overgeneralizes and omits key concept of using those sites' models to configure OpenClaw; changes focus from OpenClaw configuration to generic proxy for developers. | Hallucinated 'developers'; source asks about '各位佬' (bros/users) configuring '大龙虾' (OpenClaw). | Completely misses the specific mention of '龙虾' (OpenClaw) and changes the question from asking about configuring OpenClaw with models to a generic question about free API sites. |
| Qwen/Qwen2.5-14B-Instruct | Title: Has anyone used the model configuration for the free shared API site? Content: Which one is more suitable for daily use? If the free shared API site does not meet the requirements, are there any affordable and stable APIs recommended? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: Does anyone use the free shared API site model for big shrimp?\nContent: Which one is more suitable for daily use? If the free shared API site does not meet your needs, do you have any affordable and stable API recommendations?", "zh-TW": "標題:各位老哥有使用公益站的大龍蝦模型嗎?\n內容:哪個站比較適合日常使用?若免費公益站無法滿足需求,是否有推薦便宜且穩定的API?", "ja": "タイトル:大虾のモデルを公益ステーションで使用していますか?\n内容:どのサイトが日常使用に適しているでしょうか?無料の公益ステーションが満足しない場合は、安価で安定したAPIの推奨はありますか?", "ko": "제목:대게 모델을 무료 공용站在韩语中不能直接翻译为“公益站”,这里保持原意:"各位老哥有使用公益站的大虾模型吗?\n内容:哪个站点更适合日常使用?如果免费公益站无法满足需求,有没有推荐的经济实惠且稳定的API?", "de": "Titel: Habt ihr den Modell für große Krabben von der kostenlosen geteilten API-Station verwendet?\nInhalt: Welche Station ist besser geeignet für den täglichen Gebrauch? Wenn die kostenlose geteilte API-Station nicht ausreicht, gibt es Empfehlungen für kostengünstige und stabile APIs?", "fr": "Titre : Utilisez-vous le modèle de grand homard pour la station de bienfaisance ?\nContenu : Quelle station est plus adaptée à l'utilisation quotidienne ? Si la station de bienfaisance ne répond pas à vos besoins, avez-vous des recommandations d'API abordables et stables ?", "it": "Titolo: Qualcuno usa il modello per i grandi gamberi dalla piattaforma di beneficenza?\nContenuto: Qual è quella più adatta all'uso quotidiano? Se la piattaforma di beneficenza non soddisfa le tue esigenze, hai consigli su API economiche e stabilmente funzionanti?" } |
严重
[术语]
"big shrimp"
理由: Failed to translate the slang '大龙虾' (OpenClaw), using a literal seafood translation. | '大龙虾' is slang for OpenClaw, not literal big shrimp. Should be 'OpenClaw'. | Incorrect translation of '大龙虾' as 'big shrimp' instead of 'OpenClaw'. | Mistranslation; here 大龙虾 refers to OpenClaw, not literal big shrimp. |
| Qwen/Qwen3-14B | Title: Has anyone used a free shared API site's model configuration for big lobster? Content: Which site is more suitable for daily use? If the public proxy doesn't meet your needs, are there any affordable and stable APIs recommended? |
严重
[准确性]
"Title: Has anyone used a free shared API site's model configuration for big lobster?"
理由: Slight nuance loss: source is specifically "公益站" (free public proxy sites), while "free shared API site" is close but less precise. | Mistranslates '大龙虾' - should be 'OpenClaw' not 'big lobster'. The literal translation misses the software context | Literal translation of '大龙虾' (OpenClaw) as 'big lobster', failing to identify the slang term for the software. | The translation incorrectly translates '大龙虾' as 'big lobster' instead of 'OpenClaw'. |
| Qwen/Qwen3.5-4B | Title: Do any of you use the model configuration 'Big Lobster' from the free shared API site? Content: Which one is more suitable for daily use? If the free shared API site doesn't meet your needs, do you have recommendations for cheap and stable APIs? |
致命
[准确性]
"model configuration 'Big Lobster'"
理由: Misinterpreted '大龙虾' (slang for OpenClaw) as a specific model configuration name 'Big Lobster' rather than the software itself. | Should be 'OpenClaw'. '大龙虾' is slang for OpenClaw, not a literal lobster. The hypothesis also misinterprets '大龙虾' as a model configuration name. | “大龙虾” refers to the OpenClaw application, not a model configuration called ‘Big Lobster’.
严重
[术语]
"model configuration 'Big Lobster'"
理由: Should be 'OpenClaw' not 'Big Lobster' | Incorrect translation of '大龙虾' as 'Big Lobster' instead of 'OpenClaw' | Incorrect product/tool naming; should use OpenClaw rather than a literal nickname. |
| Qwen/Qwen3.5-9B | Title: Has anyone here used the model configuration for the free shared API site? Content: Which site is more suitable for daily use? If the public proxy doesn't meet your needs, do you have any recommendations for affordable and stable APIs? |
严重
[准确性]
"model configuration for the free shared API site"
理由: Misinterpretation. The source asks about using models *from* these sites to configure '大龙虾' (OpenClaw), not configuring the site itself. | Should be 'free API proxy sites' (plural). Also misses the context of '大龙虾' (OpenClaw configuration). | Overly generic; original specifically mentions models from free API proxy sites used to configure OpenClaw.
轻微
[术语]
"free shared API site"
理由: '公益站' is better translated as 'free API proxy sites' or 'charity stations' in this context, 'shared' adds unintended meaning. | Should be 'free API proxy sites' not 'free shared API site' | Should be translated as 'free API proxy sites' instead of 'free shared API site'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Have you used the model configuration of public proxy dragon shrimp at any community sites? |
严重
[术语]
"public proxy dragon shrimp"
理由: Literal translation of slang '大龙虾' (OpenClaw) instead of using the established term. | Should be 'model configuration of OpenClaw' instead of 'model configuration of public proxy dragon shrimp'. | Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'dragon shrimp'. '公益站' means 'free API proxy site' not 'community sites'. |
| google/gemma-3-12b-it | Title: Has anyone used models configured with Da Long Xia on a free API site? Content: Which sites are best suited for daily use? If free API sites aren't enough, can you recommend any cheap and stable APIs? |
致命
[准确性]
"Da Long Xia"
理由: Hallucinated or incorrect Pinyin transliteration for '大龙虾' which refers to the software 'OpenClaw'. Should be 'OpenClaw' or 'Big Lobster' (if keeping metaphor), but 'Da Long Xia' is unnatural. | '大龙虾' should be translated as 'OpenClaw', not left as 'Da Long Xia' or 'Lobster' | ‘大龙虾’ here refers to OpenClaw, not a literal or transliterated ‘Da Long Xia’ brand; should use correct product name. |
| google/gemma-3-1b-it | What model configuration does the '公益站' model use? |
致命
[准确性]
"What model configuration does the '公益站' model use?"
理由: The hypothesis completely misinterprets the source. The source asks if users configure OpenClaw using free site models. The hypothesis asks what configuration the free site model itself uses. It also fails to translate '公益站' and omits the second sentence of the content. | Completely different meaning; original asks if anyone uses free proxy-site models to configure OpenClaw and for recommendations on suitable sites and cheap stable APIs. | Completely misunderstands the source. The question is about using models from free API proxy sites to configure OpenClaw, not about what model configuration the sites use. | Incorrect translation - hypothesis asks about model configuration but reference asks about which site is suitable and recommendations for cheap stable APIs |
| google/gemma-3-4b-it | Does anyone here use Model Config Dragon on public proxy sites? Which one is suitable for daily use? If the public proxy doesn’t meet the requirements, are there any cheap and stable APIs recommended? |
致命
[准确性]
"Model Config Dragon"
理由: Severe mistranslation of '大龙虾'. The model translated '龙' as 'Dragon' and '虾' was implied or missed, creating a non-existent term 'Model Config Dragon' instead of recognizing '大龙虾' as 'OpenClaw' (or 'Big Lobster' slang). | Mistranslation of '大龙虾' - should be 'OpenClaw', not 'Model Config Dragon'. '龙虾' is a software tool name. | Misinterprets ‘大龙虾’ which is a playful name related to OpenClaw, not ‘Model Config Dragon’ |
| google/translategemma-12b-it | Title: Has anyone used models configured with "Da Long Xia" on public API sites? Content: Which site is more suitable for daily use? If public API sites don't meet your needs, can you recommend any cheap and stable APIs? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Have anyone here used large language models with public proxy sites? Content: Which site is most suitable for daily use? If the public proxy site doesn't meet the needs, are there any cheap and stable APIs recommended? |
严重
[准确性]
"large language models"
理由: Should be 'models from free API proxy sites' | Mistranslation of '大龙虾'. In this context, it refers to 'OpenClaw' (the slang term), not generic LLMs. The hypothesis misses the specific slang reference. | Source is about configuring OpenClaw using models from free proxy sites; hypothesis generalizes to 'LLMs' and loses OpenClaw-specific configuration context. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 各位佬有使用公益站的模型配置大龙虾么?' in en |
致命
[准确性]
"Translation of 'Title: 各位佬有使用公益站的模型配置大龙虾么?' in en"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Hypothesis does not provide actual translation. Should be 'Do any of you bros use models from free API proxy sites to configure OpenClaw?' per reference. | Meta-comment only; fails to translate the title and misses 'bros', 'free API proxy site', and 'configure OpenClaw' meanings. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: Have anyone used the model configuration for large lobsters on the public welfare site?", "zh-TW": "Title: 是否有人在使用公益站上的大型龙虾模型配置?」、 "ja": "Title: 有人使用过公益站上的大型龙虾模型配置吗?", "ko": "Title: 혹시 누군가 공공 복지 사이트의 대형 랍스터 모델 설정을 사용해 보셨나요?", "de": "Title: Hat jemand die Modelkonfiguration für große Hummer auf der öffentlichen Wohlfahrtssite verwendet?", "fr": "Titre : Quelqu’un a-t-il utilisé la configuration du modèle pour les gros homards sur le site de bienfaisance ?", "it": "Titolo: Qualcuno ha utilizzato la configurazione del modello per i grossi aragoste sul sito di pubblica beneficenza?"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Requesting recommendations for reverse proxy servers | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Requesting recommendations for reverse proxy servers | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: Recommend reverse proxy server Content: So I want to see if I can set up a small machine in Japan or Hong Kong/Singapore as a reverse proxy. Would this have any effect on API speed? If possible, please recommend a good service provider with a good route. Budget is not an issue, just need a small machine. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Request for Reverse Proxy Server Recommendation Content: So I want to see if I can set up a small machine somewhere like Japan or Hong Kong, Singapore as a reverse proxy. Would that have any speed-up effect on the API? If possible, please recommend a service provider with good lines; budget is not an issue, just a small machine. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Richiesta di raccomandazione per un server reverse proxy |
严重
[准确性]
"Richiesta di raccomandazione per un server reverse proxy"
理由: Only renders the title; entire content about Japan/HK/SG VPS, speeding up API, routing quality, and budget is omitted. | The translation incorrectly translates '反代服务器' as 'reverse proxy' instead of 'reverse proxy server'. | The hypothesis only translates the title and completely omits the entire content paragraph regarding specific locations (Japan, HK, Singapore) and budget. |
| Qwen/Qwen3.5-4B | Title: Looking for recommendations for a reverse proxy server Content: So I'd like to see if setting up a small machine in Japan or Hong Kong/Singapore as a reverse proxy would speed up API performance? If so, could you recommend a service provider with good routes? Budget is not an issue; a small machine will suffice. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Looking for recommendations on reverse proxy servers Content: So I want to see if setting up a small server in Japan or Hong Kong/Singapore could work as a reverse proxy. Would this improve API performance? If so, please recommend a service provider with good routing. Budget is not an issue; just need a small machine. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Looking for a Reverse Proxy Server Content: So I want to see if I can set up a small machine in Japan or Hong Kong or Singapore as a reverse proxy. Would that speed up my API? If possible, could you recommend a reliable service provider? Budget isn't an issue; just need a good connection. | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: Request Recommendation for Reverse Proxy Server Content: So I was wondering if setting up a small server in Japan or Hong Kong, Singapore, etc., as a reverse proxy would improve API speed? If so, could you recommend a service provider with good routes? Budget isn't an issue, just a small server. | 通过 (无共识错误) |
| google/gemma-3-1b-it | So I want to see if I can add a Japanese or Hong Kong Singapore-style machine to a reverse proxy to see if it will improve API speed? If so, please recommend some reliable service providers with no budget constraints, just a small machine. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: Requesting a Reverse Proxy Server Recommendation Content: So I was wondering if I could set up a small machine in Japan or Hong Kong/Singapore to do reverse proxying and whether it would speed up the API. If possible, please recommend a good route service provider, budget is not an issue, just a small machine. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Looking for recommendations for reverse proxy servers. Content: So I was wondering if setting up a small server in Japan or Hong Kong, Singapore, etc., as a reverse proxy would improve API performance? If so, could you recommend a service provider with good network connectivity? Budget is not an issue, just need a small server. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Requesting recommendations for reverse proxy servers | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 求推荐反代服务器' in en |
致命
[准确性]
"Translation of 'Title: 求推荐反代服务器' in en"
理由: Hypothesis does not provide actual translation. Should be 'Looking for reverse proxy server recommendations' per reference. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. | No translation given; only a description of the task. Should deliver 'Looking for reverse proxy server recommendations'. | The hypothesis is a meta-instruction describing the task rather than providing the actual translation. |
| tencent/HY-MT1.5-7B | "en": "Title: Seeking Recommendations for Reverse Proxy Servers", "zh-TW": "标题:寻求推荐的反代理服务器", "ja": "标题:おすすめのリバースプロキシサーバーを探しています", "ko": "제목:추천되는 리버스 프록시 서버를 찾고 있습니다", "de":="Titel: Suche nach Empfehlungen für Reverse-Proxy-Server", "fr":="Titre : Recherche de recommandations pour des serveurs reverse proxy", "it":="Titolo: Cerco raccomandazioni per server reverse proxy"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: The 5.3 codex and 5.4 at公益站 are actually fakes by gpt4o: It's not叠甲's fault this thing : |
严重
[流畅性]
"公益站"
理由: Left untranslated while rest is English; inconsistent style, though meaning is still clear. | Contains untranslated Chinese terms ('公益站', '叠甲') mixed with English. Poor fluency and incomplete translation. | The sentence is unintelligible due to untranslated Chinese terms. |
| CohereLabs/tiny-aya-water | Title: The 5.3 codex and 5.4 at Public Station are actually fake gpt-4os: It's not叠甲's fault for this |
严重
[术语]
"Public Station"
理由: Incorrect translation of '公益站' - should be 'free API proxy site' not 'Public Station'. '叠甲' is a transliteration left untranslated (should be explained as 'disclaimer' or similar). Also missing Content translation. | Awkward literal rendering; "free API proxy sites" is clearer and matches reference terminology. | Literal translation of '公益站' (free API proxy sites) loses the specific community meaning.
致命
[准确性]
"It's not叠甲's fault for this"
理由: Failed to translate the slang '叠甲' (putting on armor/disclaimer), leaving it in Chinese characters. | Fails to translate '叠甲' (a Chinese internet slang term meaning 'to disclaim/defend oneself preemptively'). Leaves untranslated Chinese characters in English text. | Misinterprets "叠甲" which here means "pre-emptive disclaimer/defense"; target reads as a person's name/responsibility rather than a disclaimer phrase. |
| Qwen/Qwen2.5-14B-Instruct | Title: The 5.3codex and 5.4 of the Community Site are actually impersonated by gpt4o : This isn't the fault of the Community Site Content: Asked some friends about their Community Sites, they all crashed. Suspect chatgpt is adjusting again. First, let's protect the Community Site. 3721: It’s not the fault of the Community Site; the issues with 5.3codex and 5.4 stem from oai's policy, not the Community Site. |
严重
[术语]
"Community Site"
理由: Less specific than "free API proxy sites"; loses some technical nuance. | Incorrect translation of '公益站' - should be 'free API proxy site' not 'Community Site' | Inconsistent translation of '公益站'. In this context (free API proxies), 'Community Site' is misleading; 'free API proxy sites' (as in reference) is accurate. |
| Qwen/Qwen2.5-7B-Instruct | Title: The 5.3codex and 5.4 of Public Welfare Station are actually gpt4o pretending: It's not the fault of Public Welfare Station on this matter Content: Asked some friends about the Public Welfare Station, they all crashed. Suspect ChatGPT is adjusting again first. 3721: This mistake doesn't lie with the Public Welfare Station's 5.3codex and 5.4, which follow OAI Lu You policy. |
轻微
[术语]
"Public Welfare Station"
理由: Literal translation of '公益站'. In this context, it refers to 'free API proxy sites' or 'community shared nodes', not a charity station. | Should be 'free API proxy sites' as per reference. | '公益站' refers to free API proxy sites, not 'Public Welfare Station'. Should be 'free API proxy site' or similar. | Should be ‘free API proxy sites’ or similar; literal rendering of 公益站 loses technical nuance.
严重
[准确性]
"This mistake doesn't lie with the Public Welfare Station's 5.3codex and 5.4, which follow OAI Lu You policy."
理由: Meaning distorted; original says the fault lies with OAI’s routing (路由) policy, not that sites “follow” that policy; “Lu You” left as pinyin and unclear. | '陆游政策' is likely a typo or mishearing of '路由政策' (routing policy). The translation 'Lu You' is incorrect. | Mistranslation of '陆游' (routing) as a proper name 'Lu You' (a poet or name), whereas it refers to routing policies. |
| Qwen/Qwen3-14B | ...} | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: The 5.3 codex and 5.4 of the free shared API site are actually faked as gpt4o : It's not the fault of the public proxy that we're stacking armor Content: Asked several friends about the free shared API site, all exploded. Suspect chatgpt is adjusting first. Stack armor: 3721: This is wrong - the 5.3 codex and 5.4 of the free shared API site are OAI's Luoyu policy, not the fault of the public proxy. |
严重
[准确性]
"Luoyu policy"
理由: Misrenders “路由政策” (routing policy) as a proper name; changes the technical meaning. | Should be 'routing policies' not 'Luoyu policy' | Mistranslation of '路由' (routing). 'Luoyu' is a phonetic error or hallucination; it should be 'routing policy'. |
| Qwen/Qwen3.5-9B | Title: The 5.3 codex and 5.4 on the free shared API site are actually impersonating gpt4o; this isn't about stacking armor against the public proxy site Content: I asked several friends at the free shared API site, and they all exploded. Suspecting that chatgpt is adjusting again, first let's clarify: 3721: The mistake lies with the 5.3 codex and 5.4 on the free shared API site being oai's Luoyou policy; the error wasn't made by the public proxy site. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Free Shared API Site's 5.3 Codex and 5.4 Are Actually GPT-4 Pretenders : This Doesn't Depend on Free Shared API Sites | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: 5.3 codex and 5.4 of the public proxy are actually impersonations of gpt4o: It's not the fault of the public proxy regarding this matter Content: I asked several friends about their public proxies, and they all crashed. I suspect ChatGPT is being adjusted again. First, let's stack armor. 3721: The mistake lies with the OAI’s Luyu policy for the 5.3 codex and 5.4 of the public proxy. It’s not the public proxy’s fault. |
严重
[准确性]
"First, let's stack armor."
理由: Should be 'defending myself' or 'preemptively defending myself' - 'stack armor' is incorrect translation of '叠甲' | Literal rendering of slang; intended sense is ‘pre-emptively defending myself / disclaimer’ which is not clearly conveyed. | Literal translation of '叠甲' (internet slang for adding disclaimers/defenses). Should be 'disclaimer' or 'pre-emptive defense'.
轻微
[术语]
"It's not the public proxy's fault"
理由: '公益站' is better translated as 'free API proxy sites' or 'community sites' in this context. | Should be 'free API proxy site' not 'public proxy' - terminology mismatch | Should be 'free proxy sites' for consistency |
| google/gemma-3-1b-it | The question of the公益站's 5.3 codex and 5.4 is actually GPT4o impersonators. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: The 5.3 codex and 5.4 of Public API Sites are all impersonating gpt4o : This matter doesn't belong to Public API Sites, it's wrong that Public API Sites 5.3 codex and 5.4 are OAI's Lu Yu policy. |
致命
[准确性]
"This matter doesn't belong to Public API Sites, it's wrong that Public API Sites 5.3 codex and 5.4 are OAI's Lu Yu policy."
理由: Misrenders meaning that the fault lies in OAI’s routing policy, not in the free sites; phrase ‘Lu Yu policy’ is incorrect for ‘路由政策’ (routing policy) and sentence is logically garbled | Incorrect translation of '陆游政策' - should be 'routing policies', not 'Lu Yu policy'. This appears to be a mistranslation of '路由' (routing). | Mistranslation of '陆游'. In this context, it is likely a typo or slang for 'routing' (路由) or a specific term, but translating it as a name 'Lu Yu' (the poet) makes no sense. It should be 'routing policy'. |
| google/translategemma-12b-it | Title: The 5.3 codex and 5.4 versions offered by public proxy sites are actually impersonations of GPT4o: It's not the fault of the public proxy site. |
轻微
[术语]
"offered by public proxy sites"
理由: Reference emphasizes “free API proxy sites”; “public” misses the “free” nuance but core meaning remains. | Should be 'free API proxy sites' as per reference - '公益站' means free proxy site, not public proxy | '公益站' is consistently 'free API proxy sites' in this domain. 'Public' is acceptable but less precise. |
| google/translategemma-4b-it | Title: The 5.3 codex and 5.4 of Public API sites are actually impersonating GPT4o : This is not the fault of Public API sites | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '公益站的5.3codex和5.4其实都是gpt4o冒充的 :叠甲这事不赖公益站啊' in English |
致命
[准确性]
"Translation of '公益站的5.3codex和5.4其实都是gpt4o冒充的 :叠甲这事不赖公益站啊' in English"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Hypothesis does not provide actual translation. Should be 'The 5.3 codex and 5.4 on free API proxy sites are actually impersonated by gpt4o: Disclaimer, it's not the free sites' fault' per reference. | Meta text, no translation. Should express that 5.3 codex and 5.4 on free proxy sites are actually GPT-4o pretending, with a disclaimer that it's not the sites' fault. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: Both 5.3codex and 5.4 of the Public Welfare Site are actually impersonated by GPT-4O : The armor stacking feature is really thanks to the Public Welfare Site.", "zh-TW": "标题:公益站的5.3codex和5.4实际上都被GPT-4O冒充了:护甲叠加功能确实要归功于公益站!", "ja": "タイトル:公益站の5.3codexと5.4は実際にGPT-4Oによって偽装されています:アーマースタック機能は本当に公益站のおかげです。", "ko": "제목: 공익사이트의 5.3codex와 5.4는 사실 GPT-4O에 의해 가짜로 만들어졌습니다: 아머 스택 기능은 정말 공익사이트 덕분입니다.", "de":="Titel: Sowohl 5.3codex als auch 5.4 der Public Welfare Site werden tatsächlich von GPT-4O nachgeahmt: Die Funktion zum Stapeln von Rüstungen verdanken wir wirklich der Public Welfare Site.", "fr":="Titre : Les versions 5.3codex et 5.4 du site Public Welfare sont en fait usurpées par GPT-4O : La fonction de mise en pile des armures est vraiment grâce au site Public Welfare.", "it":="Titolo: Sia la versione 5.3codex che quella 5.4 del sito Public Welfare sono in realtà state imitate da GPT-4O: La funzione di sovrapposizione delle armature è davvero grazie al sito Public Welfare."}|im_end|> |
严重
[准确性]
"armor stacking feature"
理由: Reference means 'preemptively defending oneself' (叠甲 = defending/adding disclaimer), not literal armor stacking in gaming | Completely wrong; 叠甲 means pre-emptive disclaimer/defensive statement, not a game feature. | Literal translation of '叠甲'. In internet slang, this means making a disclaimer to protect oneself from criticism, not a game feature. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: [LinuxDo Community] Open Registration |
严重
[准确性]
"[LinuxDo Community]"
理由: Mistranslation of '方块公益站'. '方块' likely refers to 'Block' (as in the reference) or a specific project name, not 'LinuxDo Community'. 'LinuxDo' seems to be hallucinated from the currency 'LDC' explanation in the reference, but the source title says 'Block' (implied) or a specific name. | Mistranslates '方块公益站' as '[LinuxDo Community]'. Should be '[Block Free API Proxy Site]'. The reference correctly identifies this as a specific service name. | Source is "方块公益站" (Block Free API Proxy Site), not "LinuxDo Community"; conflates different entities. |
| CohereLabs/tiny-aya-water | Title: [LinuxDo Community] Open Registration |
严重
[准确性]
"[LinuxDo Community]"
理由: Mistranslated '方块公益站' (Block Free API Proxy Site) as 'LinuxDo Community'. While LDC stands for Linux Do Credit, the site name is 'Block'. | "方块公益站" refers to a specific free proxy site (Block Free API Proxy Site), not the general LinuxDo community; important entity mistranslated. | Mistranslates '方块公益站' as '[LinuxDo Community]' when it should be '[Block Free API Proxy Site]'. The reference correctly identifies it as a specific service name. |
| Qwen/Qwen2.5-14B-Instruct | Title: [LinuxDo Community Station] Open Registration Content: Quota can be exchanged through LDC, please go to the wallet page for top-up | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | [Block Public Welfare Station] Open Registration |
轻微
[术语]
"[Block Public Welfare Station]"
理由: '方块公益站' should be translated as 'Block Free API Proxy Site' or similar, not 'Block Public Welfare Station'. '公益站' is slang for free API proxy site. | 公益站 here specifically means free API proxy site; ‘public welfare station’ is literal and unclear in this context. | '方块' likely refers to the project name 'Block' or 'Cube', but 'Public Welfare Station' is a poor translation for 'free API proxy site'. | Should be 'Block Free API Proxy Site' as per reference. |
| Qwen/Qwen3-14B | [Comunità LinuxDo] Registrazione aperta |
严重
[准确性]
"[Comunità LinuxDo]"
理由: Mistranslated '方块公益站' (Block Free API Proxy Site). '方块' likely refers to 'Block' (as in the site name), not 'LinuxDo' community, although LDC is mentioned in content, the title translation is inaccurate. | Mistranslates '[方块公益站]' (Block Free API Proxy Site) as '[Comunità LinuxDo]' (LinuxDo Community) - wrong target language (Italian) and wrong meaning | The translation incorrectly translates '方块公益站' as 'LinuxDo' instead of 'Block Free API Proxy Site'. | Source "方块公益站" (Block free proxy site) is misrendered as "LinuxDo community"; incorrect naming. |
| Qwen/Qwen3.5-4B | Title: [LinuxDo Community Station] Registration Open Content: Quotas can be redeemed via LDC; please recharge by going to the wallet page yourself. |
严重
[术语]
"[LinuxDo Community Station]"
理由: Should be 'Block Free API Proxy Site' not 'LinuxDo Community Station' | Drops the ‘free/public proxy’ nuance present in “公益站”; ‘Community Station’ is vague compared to ‘free proxy site’. | Incorrect translation of '方块公益站' as 'LinuxDo Community Station' instead of 'Block Free API Proxy Site' |
| Qwen/Qwen3.5-9B | Title: [LinuxDo Charity Station] Open Registration Content: Redeem quota via LDC, please recharge on your wallet page yourself. |
严重
[准确性]
"[LinuxDo Charity Station]"
理由: Should be 'Block Free API Proxy Site'. '方块' (block/square) is the name, not 'LinuxDo'. | '方块公益站' should be rendered as 'Block Free API Proxy Site' or similar; tying it to LinuxDo is not supported by source. | '方块' means 'Block' (likely referring to the site name or logo shape), not 'LinuxDo'. 'Charity Station' is a literal translation of '公益站', 'Free API Proxy Site' is more accurate contextually. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Block Public Welfare Station Registration Open | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: [LinuxDo Charity Platform] Open Registration Content: Quota is redeemed through LDC, please go to the wallet page to recharge. |
严重
[准确性]
"[LinuxDo Charity Platform]"
理由: ‘方块公益站’ refers to Block free API proxy site; naming it ‘LinuxDo Charity Platform’ is incorrect. | Should be '[Block Free API Proxy Site]' not '[LinuxDo Charity Platform]'. '方块' means 'Block', not 'LinuxDo' | Mistranslation of '方块公益站'. '方块' likely refers to 'Block' (as in reference) or a specific brand, not 'LinuxDo'. 'Charity Platform' is too formal for '公益站' (free/community site). |
| google/gemma-3-1b-it | Title: [方块公益站] Open Registration | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: [LinuxDo Community Block Station] Open Registration Content: Balance converted through LDC, please top up your wallet page. |
严重
[准确性]
"[LinuxDo Community Block Station]"
理由: Mistranslation of '方块公益站'. '方块' likely refers to 'Block' (as in the site name or concept), but 'LinuxDo Community' is an over-interpretation or hallucination of the community name not present in the source snippet. 'Free API Proxy Site' is the intended meaning of '公益站'. | Should be ‘[Block Free API Proxy Site]’; current translation adds ‘LinuxDo Community’ and changes meaning | Incorrect expansion of '[方块公益站]'. Should be '[Block Free API Proxy Site]' or similar, not 'LinuxDo Community Block Station'. |
| google/translategemma-12b-it | Title: [LinuxDo Charity Platform] Open Registration Content: The amount is redeemed through LDC. Please go to the wallet page to recharge. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: [LinuxDo Community Block Grants] Open Registration |
严重
[准确性]
"[LinuxDo Community Block Grants]"
理由: Should be 'Block Free API Proxy Site' | Misinterprets '方块公益站' which refers to a 'Block' free API proxy site, not 'Block Grants' or community grants. | Mistranslation of '方块公益站'. '方块' likely refers to 'Block' (as in the site name or concept), but 'Community Block Grants' invents a meaning not present. Reference suggests 'Block Free API Proxy Site'. |
| tencent/HY-MT1.5-1.8B | Login to Square Public Welfare Station |
轻微
[准确性]
"Login to Square Public Welfare Station"
理由: Completely incorrect translation. The source is about open registration for a free API proxy site, not login. '方块' refers to 'Block' (a platform name), not 'Square'. The hypothesis is completely wrong. | Hallucinated content ('Login') and incorrect translation of '方块' (Block) and '公益站' (Free API Proxy Site). The source text is a title and content about registration, not a login command. | Mistranslates. Source is about '[Block Free API Proxy Site] Open for registration', not a login action; misses 'open registration' and 'free API proxy' meaning. | Incorrect translation. Should be '[Block Free API Proxy Site] Open for registration' per reference. Also does not translate the content about LDC quota redemption. |
| tencent/HY-MT1.5-7B | Title: [Square Public Welfare Station] Open Registration |
严重
[准确性]
"[Square Public Welfare Station]"
理由: Mistranslation of '方块' (likely a brand name 'Block' or 'Cube') and '公益站' (free proxy site). 'Square Public Welfare Station' is nonsensical. | Mistranslation of '方块公益站' - should be 'Block Free API Proxy Site', not 'Square Public Welfare Station'. Also misses the context that this is a proxy service. | 方块公益站 is a free API proxy site named Block/Square; “Public Welfare Station” misses proxy/API nuance. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Can you recommend some tools that support both reverse proxies and Antigravity Tools? |
严重
[准确性]
"both reverse proxies and Antigravity Tools"
理由: Original asks for tools that can reverse proxy multiple API services (including Antigravity Tools and Kiro); hypothesis incorrectly suggests tools that support both reverse proxies and Antigravity Tools as separate functions, losing mention of Kiro and multi-target aspect. | Hypothesis is missing the title entirely. It only provides a partial content translation that doesn't match the source structure. | Misinterpretation of the source. The source asks for a tool that can reverse proxy *multiple targets* (Antigravity Tools AND Kiro). The hypothesis asks for a tool that supports 'reverse proxies' (general) AND 'Antigravity Tools' (as a feature), missing the parallel structure of the two specific tools. | Content missing - reference includes detailed content about wanting to try Kiro and supporting multiple reverse proxies |
| CohereLabs/tiny-aya-water | What other good reverse proxy API projects are there now? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: What are some good reverse proxy API projects available now? Content: I am currently using Antigravity Tools, and plan to try kiro's next. Are there any tools that support reverse proxies simultaneously? Recommendations please. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Are there any good reverse proxy APIs projects around now? Content: I'm currently using Antigravity Tools, and I want to try kiro later. Is there any tool that supports reverse proxy like this? | 通过 (无共识错误) |
| Qwen/Qwen3-14B | 今現在、使いやすいリバースプロキシAPIのプロジェクトはありますか? |
致命
[准确性]
"今現在、使いやすいリバースプロキシAPIのプロジェクトはありますか?"
理由: Wrong target language - hypothesis is in Japanese instead of English. Should translate from Chinese source to English | Wrong target language. The target language is Chinese (zh), but the hypothesis is in Japanese. | The translation incorrectly translates '反代API' as 'リバースプロキシAPI' instead of 'reverse proxy API'. |
| Qwen/Qwen3.5-4B | Title: Are there any other useful reverse proxy API projects available now? Content: I'm currently using Antigravity Tools and plan to try kiro's next. Is there any tool that supports reverse proxying simultaneously? Could you recommend some? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Are there any other useful reverse proxy API projects available now? Content: I'm using Antigravity Tools right now and want to try kiro later. Is there any tool that can support reverse proxying all of this at once? Any recommendations from experts? |
轻微
[风格]
"experts"
理由: Should be 'Bros' to match informal reference tone | '佬们' is informal community slang for 'bros' or 'guys', 'experts' is too formal. | Changes informal '佬们/bros' tone to more formal 'experts'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: There are still some useful reverse proxy API projects available today? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: What are some good reverse proxy API projects available now? Content: I'm using Antigravity Tools right now, and I want to try kiro later. Are there any tools that can support reverse proxying all of these at the same time? Please recommend some. | 通过 (无共识错误) |
| google/gemma-3-1b-it | "en": "What are some good use cases for an anti-proxy API project?", "zh-TW": "现在有什么好用的反代API项目可以尝试吗?", "ja": "今頃有何好用的反代API項目可以嘗試嗎?", "ko": "지금 사용 가능한 반대 시그널 API 프로젝트가 있을까요?”, "de": "Welche guten Anwendungsfälle gibt es für ein Anti-Proxy-API-Projekt?", "fr": "Quels sont des cas d'utilisation intéressants pour une API anti-proxy ?", "it": "Ci sono progetti interessanti per API anti-proxy?" } |
致命
[准确性]
""en": "What are some good use cases for an anti-proxy API project?","
理由: Hypothesis is a JSON with multiple languages, not a proper translation - only the English part is relevant but it doesn't cover the Content at all | Misinterprets 反代API as anti-proxy instead of reverse proxy; also asks about use cases rather than projects available now. | Hypothesis is not a translation but a JSON structure with multiple languages. Should provide only English translation. Also 'anti-proxy' is incorrect - should be 'reverse proxy'. | The hypothesis is a JSON object containing translations into multiple languages and mistranslations ('anti-proxy' instead of 'reverse proxy'). It is not a valid English translation of the source text. | Missing content translation. |
| google/gemma-3-4b-it | Are there any useful reverse proxy API projects available now? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Are there any other useful reverse proxy API projects out there? Content: I'm using Antigravity Tools now, and I want to try kiro later. Is there a tool that can support reverse proxies for all of these at the same time? Please recommend some. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Are there any useful reverse proxy API projects available now? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 现在还有一些什么好用的反代API的项目啊?' in en |
致命
[准确性]
"Translation of 'Title: 现在还有一些什么好用的反代API的项目啊?' in en"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Hypothesis does not provide actual translation. Should be 'What are some good reverse proxy API projects available right now?' per reference. | No translation provided; should render as 'What are some good reverse proxy API projects available right now?'. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | What are some good reverse proxy API projects available right now? | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Don't ask if you don't understand: cpa+ccr+cc Content: I'm using the 5.3 Codex model. How can I modify the configuration in CCR to use Codex XHigh? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | Title: Don't Ask If You Don't Know: cpa+ccr+cc Content: I'm using the 5.3 Codex model. How can I modify the configuration in CCR to use Codex XHigh? | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: If In Doubt, Ask: cpa+ccr+cc Content: I am using the 5.3 Codex model. How can I modify the configuration in CCR to make CC use Codex XHigh? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: Ask if you don't understand: cpa+ccr+cc Content: I am using the 5.3 codex model, how can I modify the configuration in ccr to make cc use codex xhigh? |
严重
[术语]
"cc"
理由: Should be expanded/clarified as Claude Code to match context, not left ambiguous ‘cc’ in English translation. | Should expand to 'CPA + CCR + Claude Code' with explanations. The hypothesis leaves abbreviations unexpanded. | Should be 'CPA + CCR + Claude Code' as per reference. | The abbreviations should be expanded: CPA (Claude Code Proxy Account), CCR (Claude Code Router), CC (Claude Code). The reference shows these expansions. | Did not expand the abbreviation 'cc' to 'Claude Code' as done in the reference, though acceptable if context is known. |
| Qwen/Qwen3-14B | Se non capisci, chiedi qui: cpa+ccr+cc |
严重
[准确性]
"Se non capisci, chiedi qui: cpa+ccr+cc"
理由: Title reasonably rendered, but "cc" should be "Claude Code"; loses expansion present in reference. | Mistranslated the idiom '不懂就问' (Asking because I don't understand / No shame in asking) as a command 'If you don't understand, ask here'. Also omitted the content completely. | The translation incorrectly translates 'cpa+ccr+cc' as 'cpa+ccr+cc' instead of 'CPA + CCR + Claude Code'. | Wrong target language - hypothesis is in Italian. Also incomplete - missing content section. 'cc' should be expanded to 'Claude Code' |
| Qwen/Qwen3.5-4B | Title: Asking without knowing: cpa + ccr + cc Content: I am using the 5.3 Codex model. How can I modify the configuration in CCR to make CC use Codex XHigh? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Ask if you don't understand: cpa + ccr + cc Content: I'm using the 5.3 Codex model. How do I modify the configuration in CCR to make CC use Codex XHigh? | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Don't know, ask: cpa+ccr+cc\nContent: I'm using the 5.3 Codex model. How to modify configurations in CCR to make CC use Codex xhigh? | 通过 (无共识错误) |
| google/gemma-3-12b-it | Question: cpa+ccr+cc |
致命
[准确性]
"Question: cpa+ccr+cc"
理由: Missing title entirely. Should be 'Asking a question because I don't understand: CPA + CCR + Claude Code' | Failed to translate the title's explanatory phrase '不懂就问' and the acronyms' meanings (Claude Code Router, Claude Code) which are implied or expanded in context. | Title is missing entirely - should be 'Asking a question because I don't understand: CPA + CCR + Claude Code'. Content is completely missing. | Title only; the entire explanatory content about using 5.3 codex with CCR and CC is omitted. |
| google/gemma-3-1b-it | Translation of '不懂就问:cpa+ccr+cc' in en |
致命
[准确性]
"Translation of '不懂就问:cpa+ccr+cc' in en"
理由: The hypothesis is a meta-commentary instruction rather than a translation. It fails to translate the Title or the Content. | Not a translation - just describes that it's a translation request without providing actual translation | Hypothesis is not a translation but a meta-description. Should provide actual English translation of the title and content. | Not an actual translation; fails to render the title or the explanatory content about using 5.3 codex with CCR/CC. |
| google/gemma-3-4b-it | Ask if you don't understand: cpa+ccr+cc | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Ask Me Anything: cpa + ccr + cc Content: I am using the 5.3 Codex model. How can I modify the configuration in ccr to use the Codex xhigh setting for cc? |
严重
[准确性]
"Ask Me Anything:"
理由: Misinterprets “不懂就问” which is “asking because I don’t understand”, not an AMA-style invitation. | Mistranslated '不懂就问' (Asking because I don't understand). 'Ask Me Anything' implies the speaker is answering questions, whereas the source indicates the speaker is asking. | Source says '不懂就问' which means 'Asking a question because I don't understand', not 'Ask Me Anything' | Mistranslates title. Should be 'Asking a question because I don't understand: CPA + CCR + Claude Code' - more accurate to source meaning.
严重
[术语]
"cc"
理由: Should clarify CCR as 'Claude Code Router' and CC as 'Claude Code' for technical accuracy. | Should be 'CPA + CCR + Claude Code' - cc refers to Claude Code, not lowercase | Should explicitly map cc to Claude Code to match reference; leaving as “cc” reduces clarity in this context. |
| google/translategemma-4b-it | Title: Ask if you don't understand: cpa+ccr+cc Content: I am using the 5.3 codex model, how do I modify the configuration in ccr to use cc with codex xhigh? |
严重
[术语]
"cc"
理由: Should expand abbreviations: CPA (Claude Proxy API), CCR (Claude Code Router), CC (Claude Code) | Does not expand 'cc' to 'Claude Code', which may reduce clarity, though abbreviation matches source. | Hypothesis leaves 'cc' unexplained but should clarify it as 'Claude Code' - terminology not properly expanded |
| tencent/HY-MT1.5-1.8B | Translation of '不懂就问:cpa+ccr+cc' in en |
致命
[准确性]
"Translation of '不懂就问:cpa+ccr+cc' in en"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Hypothesis does not provide actual translation. Should be 'Asking a question because I don't understand: CPA + CCR + Claude Code' per reference. | Meta description only; does not translate title. Should output something like 'Asking because I don't understand: CPA + CCR + CC'. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: Ask if you don’t understand: cpa + ccr + cc", "zh-TW": "Title: If you don’t understand, just ask: cpa + ccr + cc", "ja": "Title: 分からない場合は質問してください:cpa + ccr + cc", "ko": "제목: 이해가 안 되면 물어보세요: cpa + ccr + cc", "de":="Titel: Fragen Sie, wenn Sie es nicht verstehen: cpa + ccr + cc", "fr":="Titre: Si vous ne comprenez pas, demandez des explications : cpa + ccr + cc", "it":="Titolo: Se non capite, chiedete spiegazioni: cpa + ccr + cc"} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Asking for advice! How is Shrimp farm done? |
严重
[准确性]
"Shrimp farm"
理由: Content missing - reference includes detailed content about running multiple instances and 'nest' of OpenClaws | Mistranslates '多开龙虾' as 'Shrimp farm'. Should be 'run multiple instances of OpenClaw'. The metaphor is completely lost. | Mistranslation of '多开龙虾'. '多开' means 'run multiple instances'. 'Shrimp farm' implies a literal farm or a bot farm, missing the technical meaning of running multiple local instances of the software. | Title should be about running multiple instances of OpenClaw, not "Shrimp farm"; loses "multi-instance" aspect and tool name OpenClaw. |
| CohereLabs/tiny-aya-water | Title: Asking for advice! How is Crab Cakes made? |
致命
[准确性]
"How is Crab Cakes made?"
理由: Incorrect translation of '龙虾' as 'Crab Cakes' instead of 'OpenClaw' | Completely mistranslates "多开龙虾" (running multiple OpenClaw instances) as cooking "Crab Cakes"; changes topic and meaning entirely. | Completely mistranslates '多开龙虾' (running multiple instances of OpenClaw) as 'How is Crab Cakes made?'. This is a severe semantic error. | Severe mistranslation of '多开龙虾' (running multiple instances of OpenClaw) as 'Crab Cakes' (food). |
| Qwen/Qwen2.5-14B-Instruct | Title: Seeking advice! How to run multiple lobsters at once Content: Recently I've noticed some terms like 'a bunch of lobsters', but I only know how to start one locally. How do you manage to run multiple lobsters? Curious and seeking your guidance. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: How to Multi-Open Lobsters? I Don't Know How to Do It\nContent: Recently, I found some terms like 'one den of lobsters'. I can only locally start and open one lobster at a time. How do you multi-open them? I'm very curious and seek advice from all the experts.", "zh-TW": "標題:多開龍蝦是怎麼做到的啊\n內容:最近發現有些養「窩」龍蝦的詞彙,但我只能當地啟動開一個龍蝦,多開又是怎麼做到的呢?很好奇,請教各位大神", "ja": "タイトル:マルチ開くロブスターはどのようにするのですか? 内容:最近、「一窟のロブスター」という言葉を見かけましたが、私は一度に一つしかロブスターを開けません。マルチ開き方はどのような方法で実現するのでしょうか?とても興味がありますので、 Experts の皆様にご教示いただけますと幸いです", "ko": "제목: 여러 개 열기 어떻게 해야 하나요? 내용: 최근에 '하나의 동굴에서 게를 키우는'이라는 표현을 봤는데, 저는 로브스타를 한 번만 열 수 있습니다. 여러 개 열기는 어떻게 하는 건가요? 매우 호기심이 생겼습니다. Experts 분들께 도움을 청합니다", "de": "Titel: Wie man mehrere Krabbensorten öffnet – ich weiß es nicht!\nInhalt: Ich habe kürzlich einige Begriffe wie „eine Höhle mit Krabben“ entdeckt. Ich kann nur eine Krabbe lokal starten und öffnen. Wie mache ich mehrere gleichzeitig auf? Ich bin sehr neugierig und bitte um Rat von Experten", "fr": "Titre : Comment ouvrir plusieurs langoustes ? Je ne sais pas comment faire !\nContenu : J'ai récemment découvert certains termes comme « un repaire de langouste ». Je peux seulement démarrer et ouvrir une langouste à la fois. Comment fait-on pour en ouvrir plusieurs ? Je suis très curieux et je demande conseil aux experts", "it": "Titolo: Come aprire più granchi aragosta? Non lo so come fare! Contenuto: Ho recentemente scoperto alcuni termini come «un rifugio di aragoste». Posso solo avviare e aprire una aragosta alla volta. Come si fa ad aprirne più contemporaneamente? Sono molto curioso e chiedo consiglio agli esperti" } |
严重
[术语]
"lobsters"
理由: Should be 'OpenClaw' as per reference. | Failed to translate the slang '龙虾' (OpenClaw), using the literal animal name. | Here 龙虾 refers to OpenClaw instances; literal ‘lobsters’ is inappropriate. | '龙虾' is slang for OpenClaw, not literal lobsters. Should be 'OpenClaw' throughout. |
| Qwen/Qwen3-14B | Aiuto! Come si avviano più istanze di granchi? |
轻微
[准确性]
"Come si avviano più istanze di granchi?"
理由: Wrong target language - hypothesis is in Italian. Also mistranslates '龙虾' (OpenClaw) as 'granchi' (crabs) | Only title-level paraphrase; full explanatory body about "nest" of OpenClaws, local single instance, and curiosity is omitted. | The translation incorrectly translates '龙虾' as 'granchi' instead of 'OpenClaw'. | Mistranslated the slang '龙虾' (OpenClaw) as literal 'granchi' (crabs). |
| Qwen/Qwen3.5-4B | Title: Asking! How is multi-instance lobster farming done? Content: I recently noticed some terms like raising a 'clawful nest' of lobsters, but I only know how to launch one instance locally. How do they achieve multi-instances? I'm very curious and seeking advice from fellow enthusiasts. |
致命
[准确性]
"multi-instance lobster farming"
理由: “养一‘窝’龙虾” is playful metaphor about running many OpenClaw instances; ‘lobster farming’ suggests literal animals. | Failed to translate the slang '龙虾' (OpenClaw) throughout the text. | Should be 'OpenClaw'. '龙虾' is slang for OpenClaw software, not literal lobsters.
严重
[术语]
"lobsters"
理由: Incorrect translation of '龙虾' as 'lobster' instead of 'OpenClaw' | Fails to preserve OpenClaw as the specific software term, reverting to literal animals. | Should be 'OpenClaw' throughout, not 'lobster' |
| Qwen/Qwen3.5-9B | Title: Asking for help! How is multi-instance running of Lobster achieved? Content: I recently noticed some terms like raising a "den" of lobsters, but I only know how to start one locally. How is multi-instance running done? Very curious, asking fellow enthusiasts. |
严重
[术语]
"Lobster"
理由: Should retain product name OpenClaw; literal 'Lobster' loses specific reference. | Should be 'OpenClaw' not 'Lobster' (slang term not translated) | Should be translated as 'OpenClaw' instead of 'Lobster'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | I recently discovered some terms for raising a "nest" of shrimp or something similar, but I can only open one shrimp at a time locally. How do you manage to open multiple ones? Curious, seeking advice from fellow enthusiasts. |
严重
[术语]
"shrimp"
理由: Incorrect translation of '小龙虾' - should be 'OpenClaw' not 'shrimp'. '一窝' means 'a nest/multi-instance' not 'a nest of shrimp'. | Literal translation of '龙虾' (shrimp) instead of 'OpenClaw'. The metaphor 'nest' makes no sense without the correct software name. | Literalizes 龙虾 instead of established term OpenClaw used in context/reference. |
| google/gemma-3-12b-it | Question: How is multi-instance Lobster done? |
致命
[准确性]
"Question: How is multi-instance Lobster done?"
理由: Only the title-level question is translated; all contextual content about ‘nest of OpenClaws’ and local single instance is omitted. | Title is incomplete - should be 'Need advice! How do you run multiple instances of OpenClaw?'. Content is completely missing. | Incomplete translation. Missed the title prefix '求教!' and the entire Content section explaining the context of 'nest' and local startup. | '龙虾' should be 'OpenClaw', not 'Lobster'. Also missing content translation. |
| google/gemma-3-1b-it | Translation of '求教!多开龙虾是怎么做的啊' in en |
致命
[准确性]
"Translation of '求教!多开龙虾是怎么做的啊' in en"
理由: The hypothesis is a meta-commentary instruction rather than a translation. It fails to translate the Title or the Content. | Hypothesis is not a translation but a meta-description. Should provide actual English translation of the title and content. | Meta-text instead of translation; ignores both title and detailed content about running multiple instances (a ‘nest’) of OpenClaw. | Missing content translation. | Not a translation - just describes that it's a translation request without providing actual translation |
| google/gemma-3-4b-it | Asking for help! How is multi-opening lobster made? | 通过 (无共识错误) |
| google/translategemma-12b-it | Question: How is multi-instance running of Lobster achieved? |
致命
[准确性]
"Question: How is multi-instance running of Lobster achieved?"
理由: 1. Translated '龙虾' literally as 'Lobster' instead of 'OpenClaw'. 2. Omitted the entire 'Content' section. | Should be 'Need advice! How do you run multiple instances of OpenClaw?' - title format and terminology error | Missing title translation. Mistranslates '龙虾' as 'Lobster'. Should be 'OpenClaw'. Also missing content translation. |
| google/translategemma-4b-it | How to open multiple lobsters? |
致命
[准确性]
"How to open multiple lobsters?"
理由: Literal translation of '多开龙虾' (run multiple instances of OpenClaw). Treats software as animals. | Literal 'lobsters' mistranslates '龙虾' (OpenClaw instances) and misses 'run multiple instances' meaning. | Should be 'OpenClaw' |
| tencent/HY-MT1.5-1.8B | Translation of '求教!多开龙虾是怎么做的啊' in en |
致命
[准确性]
"Translation of '求教!多开龙虾是怎么做的啊' in en"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Only a meta phrase; should translate as 'Need advice! How do you run multiple instances of OpenClaw?'. | Hypothesis does not provide actual translation. Should be 'Need advice! How do you run multiple instances of OpenClaw?' per reference. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: How to Multiple-Open Lobsters?", "zh-TW": "Title: 如何多开龙虾?", "ja": "Title: ロブスターを複数開く方法は?", "ko": "Title: 어떻게 로브스터를 여러 마리 동시에 키우나요?", "de":="Titel: Wie öffnet man mehrere Lobster gleichzeitig?", "fr": "Titre: Comment ouvrir plusieurs lobster en même temps ?", "it": "Titolo: Come si fanno più aperti i lobster contemporaneamente?"} |
致命
[准确性]
"How to Multiple-Open Lobsters? / 多开龙虾"
理由: Misinterprets 龙虾 as literal lobsters, instead of OpenClaw instances. | Mistranslation of '龙虾' - should be 'OpenClaw', not 'Lobsters'. The hypothesis provides multiple language versions when only English was requested. | Hypothesis is a JSON object with multiple languages. It translates '多开龙虾' literally as opening multiple lobsters, missing the meaning of running multiple instances of the software. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: How do I determine if GPT-5.4 has been shadowbanned? | 通过 (无共识错误) |
| CohereLabs/tiny-aya-water | How do I determine if GPT-5.4 has been shadowbanned? |
轻微
[准确性]
"How do I determine if GPT-5.4 has been shadowbanned?"
理由: Partial meaning: retains shadowban but omits "downgraded" aspect present in "降智" and reference. | Mistranslates '降智' as 'shadowbanned' when it should be 'shadowbanned/downgraded'. The reference shows both terms are needed to capture the full meaning. | Source repeats the title in content; hypothesis merges them. Acceptable but slightly less precise than reference regarding 'downgraded'. |
| Qwen/Qwen2.5-14B-Instruct | Title: How to determine if GPT 5.4 has been nerfed? Content: How to determine if GPT 5.4 has been nerfed? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | How to determine if GPT5.4 has been nerfed? |
轻微
[术语]
"nerfed"
理由: 'Nerfed' is acceptable gaming slang, but 'shadowbanned' or 'downgraded' (as in reference) might be more precise for '降智' in the context of LLM quality reduction. | Captures ‘降智’ partially but misses shadowban/downgrade nuance; could be more precise like ‘downgraded’ or ‘dumbed down’ depending on context. | Should be 'How to tell if GPT-5.4 has been shadowbanned/downgraded?' The term '降智' is better translated as 'shadowbanned/downgraded' in this context, not 'nerfed'. | Should be 'shadowbanned/downgraded' as per reference. |
| Qwen/Qwen3-14B | Title: How to determine if GPT5.4 has been nerfed? Content: How to determine if GPT5.4 has been nerfed? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | How to determine if GPT-5.4 has been nerfed? |
轻微
[术语]
"nerfed"
理由: Incorrect translation of '降智' as 'nerfed' instead of 'shadowbanned/downgraded' | Should be 'shadowbanned/downgraded' not 'nerfed' | Should be 'shadowbanned/downgraded' for more precise terminology in this context. 'Nerfed' is gaming slang and less precise than the reference. | Captures performance downgrade but misses the ‘shadowban’ aspect in the reference; partial semantic mismatch. | 'Nerfed' is acceptable gaming slang, but 'shadowbanned' or 'downgraded' (as in reference) might be more precise for '降智' in the context of LLM quality reduction. However, 'nerfed' is not a strict error. |
| Qwen/Qwen3.5-9B | Title: How to tell if GPT 5.4 has been nerfed? Content: How to tell if GPT 5.4 has been nerfed? |
轻微
[术语]
"nerfed"
理由: Should be 'shadowbanned/downgraded' as in reference. '降智' refers to model degradation, not gaming nerfs. | '降智' specifically refers to intelligence downgrade or shadowban. 'Nerfed' is close but 'shadowbanned/downgraded' is more precise per reference. | Should be translated as 'shadowbanned/downgraded' instead of 'nerfed'. | Should be 'shadowbanned/downgraded' not 'nerfed' (gaming slang not appropriate) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | How to judge if GPT 5.4 has been downgraded? | 通过 (无共识错误) |
| google/gemma-3-12b-it | How to tell if GPT-5.4 has been nerfed? | 通过 (无共识错误) |
| google/gemma-3-1b-it | How to determine if GPT5.4 has been downgraded? | 通过 (无共识错误) |
| google/gemma-3-4b-it | How to determine if GPT 5.4 has been downgraded? | 通过 (无共识错误) |
| google/translategemma-12b-it | How to tell if GPT-5.4 has been nerfed? | 通过 (无共识错误) |
| google/translategemma-4b-it | How to determine if GPT 5.4 has been downgraded? |
轻微
[准确性]
"downgraded"
理由: Incomplete translation; the Content section is identical to the title in the source, but the hypothesis only provides the title, effectively dropping the repetition which might be intentional emphasis, or simply failing to format it as Content. More critically, it misses the nuance of 'shadowbanned' vs 'downgraded' often implied by '降智' in this community, though 'downgraded' is acceptable. | Should be 'shadowbanned/downgraded' | Conveys '降智' as downgrade but omits nuance of being 'dumbed down' or shadowbanned; still mostly acceptable. |
| tencent/HY-MT1.5-1.8B | How to determine if GPT5.4 has been downgraded in intelligence? | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "How to determine if GPT5.4 has been downgraded?", "zh-TW": "如何判断GPT5.4是否被降级了?", "ja": "どのようにしてGPT5.4がダウングレードされたかを判断するか?", "ko": "어떻게 GPT5.4가 성능 저하를 겪었는지 판단하는가?」 , "de": "Wie kann man feststellen, ob GPT5.4 herabgestuft wurde?", "fr": "Comment savoir si GPT5.4 a été dégradé ?", "it": "Come si può determinare se GPT5.4 è stato degradato?"}|im_end|> |
严重
[术语]
"downgraded"
理由: Should be 'shadowbanned/downgraded' as per reference, not just 'downgraded' | Incomplete translation - should be 'shadowbanned/downgraded' as shown in reference. The term '降智' has specific meaning in this context. | 'Shadowbanned' or 'lobotomized' might be more precise for '降智' in this context, but 'downgraded' is acceptable. The main error is the JSON format. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Dandelion公益机场 this month is suspended activity, very sorry! |
严重
[流畅性]
"Dandelion公益机场 this month is suspended activity, very sorry!"
理由: Mixes untranslated Chinese ('公益机场') with English. Should be 'Dandelion free proxy provider'. Poor fluency and incomplete translation. | Unnatural English word order and missing article; should be "Dandelion free proxy provider is suspending activities this month, very sorry!" | Chinglish syntax. Should be 'activities are suspended this month'. |
| CohereLabs/tiny-aya-water | Title: Dandelion公益机场 this month is suspended activities, very sorry! | 通过 (无共识错误) |
| Qwen/Qwen2.5-14B-Instruct | Title: The Public Dandelion Charity Proxy Service is suspended this month, we apologize! Content: Currently, the proxy service website cannot be accessed or logged into; please wait for repairs, node usage will not be affected. | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title:蒲公英公益机场这个月暂停活动,非常抱歉! -> Proxy Node for Public Welfare is Suspended This Month, We Apologize! Content:现阶段机场处于无法访问和登录页面,等待修复即可,节点使用不受影响。-> The proxy node is currently inaccessible and login is unavailable; please wait for the issue to be resolved. Node usage will not be affected. |
严重
[术语]
"Proxy Node for Public Welfare"
理由: Literal and vague; ‘free proxy/VPN provider’ is clearer for 公益机场. | Mistranslation of '公益' in the context of '机场' (proxy provider). It should be 'free' or 'community-run', not 'public welfare'. | '公益机场' refers to a free proxy/VPN service provider, not 'Proxy Node for Public Welfare'. Should be 'free proxy provider' or similar. |
| Qwen/Qwen3-14B | Title: The 'Dandelion Public Welfare Proxy Provider' has suspended activities this month. We apologize! Content: At this stage, the proxy provider is unable to access and log in pages. Please wait for the fix. Node usage is not affected. | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Puffin Public Airport activity suspended this month, very sorry! Content: The airport is currently inaccessible and the login page cannot be accessed. Please wait for repairs; node usage remains unaffected. |
严重
[准确性]
"Puffin Public Airport"
理由: Mistranslated '蒲公英' (Dandelion) as 'Puffin'. Also, 'Airport' is a literal translation of the slang '机场' (proxy provider), which should be localized to 'proxy provider' or similar for clarity. | Misinterprets “蒲公英” (Dandelion) as Puffin; brand/name translation is incorrect. | Should be 'Dandelion free proxy provider'. '蒲公英' (dandelion) is mistranslated as 'Puffin'. Also '机场' (airport) is slang for proxy provider, not literal airport.
严重
[术语]
"airport"
理由: Should be 'Dandelion free proxy provider' not 'Puffin Public Airport' | Incorrect translation of '蒲公英公益机场' as 'Puffin Public Airport' instead of 'Dandelion free proxy provider' | Literal rendering of “机场” may confuse readers unfamiliar with Chinese VPN slang (proxy/VPN provider). |
| Qwen/Qwen3.5-9B | Title: The Dandelion Public Welfare Proxy Provider is pausing activities this month; we apologize for any inconvenience! Content: At present, the proxy provider is inaccessible and the login page is unavailable. Please wait for repairs; node usage remains unaffected. |
轻微
[术语]
"Public Welfare Proxy Provider"
理由: '公益机场' in context is free VPN/airport service; 'free proxy provider' is closer than 'public welfare proxy provider'. | Should be 'Dandelion free proxy provider' - simpler and more direct. | Should be 'free proxy provider' not 'Public Welfare Proxy Provider' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Poppyseed Public Proxy Service Temporarily Suspended This Month, Apologies! | 通过 (无共识错误) |
| google/gemma-3-12b-it | Title: The Puffballs VPN proxy service will be suspended this month, we apologize for any inconvenience! |
严重
[准确性]
"VPN proxy service"
理由: Original is ‘公益机场’ (free proxy provider); adding ‘VPN’ slightly narrows the meaning but is close. | Mistranslation of '蒲公英' (Dandelion). 'Puffballs' is a different fungus. | '蒲公英' (dandelion) should be 'Dandelion', not 'Puffballs'. Also should be 'free proxy provider' not 'VPN proxy service' |
| google/gemma-3-1b-it | The airport is currently inaccessible and unable to log in, awaiting repair. Node usage is unaffected. |
严重
[准确性]
"The airport is currently inaccessible and unable to log in, awaiting repair. Node usage is unaffected."
理由: Missing the title translation entirely. Also 'airport' is jargon for proxy provider - should clarify this is about a proxy service, not an actual airport. | The hypothesis completely omits the Title translation. It also fails to translate '机场' (proxy provider) correctly, using the literal 'airport' which is confusing in this context. | Slight inaccuracy - '节点使用不受影响' means 'VPN node usage is not affected' |
| google/gemma-3-4b-it | Title: Wildflower Charity Airport has temporarily suspended activities, we apologize for any inconvenience! |
严重
[准确性]
"Wildflower Charity Airport"
理由: Original name is ‘蒲公英公益机场’ (Dandelion); ‘Wildflower’ is incorrect | Mistranslation of '蒲公英'. '蒲公英' means 'Dandelion', not 'Wildflower'. 'Airport' is the correct slang for proxy provider. | Mistranslation of '蒲公英' - should be 'Dandelion', not 'Wildflower'. Also '公益机场' should be 'free proxy provider', not 'Charity Airport'. |
| google/translategemma-12b-it | Title: Dandelion Charity Proxy Service will be suspending activities this month. We sincerely apologize! |
轻微
[术语]
"Proxy Service"
理由: “机场” here is more like “proxy provider/VPN provider”; “proxy service” is understandable but slightly generic. | Should be 'Dandelion free proxy provider' not 'Charity Proxy Service' - more accurate terminology. | Should be 'free proxy provider' - '公益' means free/non-profit, not charity | '公益机场' is 'free proxy provider'. 'Charity' has a slightly different connotation than 'free/community-run' in this context. |
| google/translategemma-4b-it | Title: Puggery Charity Airport has suspended activities this month, very sorry! |
严重
[准确性]
"Puggery Charity Airport"
理由: Should be 'Dandelion free proxy provider' | Mistranslation of '蒲公英' (Dandelion). 'Puggery' is incorrect. | '蒲公英' means 'Dandelion'; 'Puggery' is incorrect and inappropriate. '公益机场' is 'free/proxy provider', not literal airport. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '蒲公英公益机场这个月暂停活动,非常抱歉!' in en", "zh-TW": "Translation of '蒲公英公益机场这个月暂停活動,非常抱歉!' in zh-TW", "ja": "Translation of '蒲公英公益机场这个月暂停活動,非常抱歉!' in ja", "ko": "Translation of '蒲公英公益机场这个月暂停活動,非常抱歉!' in ko", "de": "Translation of '蒲公英公益机场这个月暂停活動,非常抱歉!' in de", "fr": "Translation of '蒲公英公益机场这个月暂停活動,非常抱歉!' in fr", "it": "Translation of '蒲公英公益机场这个月暂停活動,非常抱歉!' in it} | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Title:蒲公英公益机场 this month is suspended; we apologize for the inconvenience! |
轻微
[流畅性]
"蒲公英公益机场 this month is suspended"
理由: Awkward phrasing mixing Chinese characters and English grammar. | Mixing Chinese and English - should fully translate '蒲公英公益机场' as 'Dandelion free proxy provider' for fluency. | Awkward mixture of Chinese and English; should translate provider name or format consistently in English title. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: 【Feedback Community】Share 2000 Codex Previews, Take What You Want |
严重
[准确性]
"Codex Previews"
理由: Mistranslation of '普号'. '普号' means 'standard accounts' or 'regular accounts', not 'Previews'. | Original is "codex普号" meaning standard/free accounts, not "previews". | Missing 'Feedback Community' brackets and context about importing to CPA |
| CohereLabs/tiny-aya-water | Title: 【Feedback Community】Share 2000 codex issues, everyone can pick up |
严重
[准确性]
"Share 2000 codex issues"
理由: Mistranslated '普号' (standard/free accounts) as 'issues'. | Mistranslates '普号' as 'issues' when it should be 'standard/free accounts'. The term refers to account types, not issues. | "issues" mistranslates "普号" which means standard/free accounts; changes meaning from accounts to problems/issues. |
| Qwen/Qwen2.5-14B-Instruct | Title: [Give Back to Community] Share 2000 codex普号, everyone take what you need Content: Import cpa and use directly, supports gpt-5.4 | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Title: [Return to Community] Share 2000 codex LinuxDo accounts, take what you need,各位大佬 Content: Import cpa and use directly, supports gpt-5.4 | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: [Community Feedback] Share 2000 codex general accounts, everyone can take them freely | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: [Community Feedback] Share 2000 codex accounts, everyone is welcome to take them Content: Import CPA directly, supports gpt-5.4 |
轻微
[术语]
"codex accounts"
理由: Should be 'Codex standard/free accounts' for more precision. Reference clarifies the account type. | Vague translation of 'codex普号' as 'codex accounts' instead of 'Codex standard/free accounts' | Should be 'Codex standard/free accounts' not 'codex accounts' |
| Qwen/Qwen3.5-9B | Title: [Community Appreciation] Share 2000 codex Puhao numbers, everyone can take them freely Content: Import CPA directly for use, supports GPT-5.4 | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Feedback Community: Share 2000 Codex Pro Numbers, Take Them Guys |
严重
[准确性]
"Codex Pro Numbers"
理由: Mistranslated '普号' (standard/free accounts) as 'Pro Numbers' (opposite meaning). | Translates '普号' as 'Pro Numbers' when it means 'standard/free accounts'. Should be 'standard/free accounts' not 'Pro'. Missing content section. | “普号” means standard/free accounts, not “Pro”; reverses account tier meaning. |
| google/gemma-3-12b-it | Title: [Community Feedback] Share 2000 codex universal codes, take them yourself, masters. |
严重
[准确性]
"universal codes"
理由: '普号' means 'standard accounts' or 'regular accounts', not 'universal codes'. | Should be 'standard/free accounts' not 'universal codes'. Also 'masters' is awkward; should be 'bros' | ‘普号’ refers to standard/free accounts, not ‘codes’; this changes what is being shared.
轻微
[风格]
"masters"
理由: Awkward phrasing for '各位佬自取'. 'Help yourselves, bros' is better. | Should be 'help yourselves bros' to match the informal community tone in source. 'Masters' is inappropriate translation of '佬' | Overly formal and deferential compared to casual ‘bros’; tone shift but still understandable. |
| google/gemma-3-1b-it | Share 2000 codex pnums for the community, everyone take them self. |
严重
[流畅性]
"everyone take them self"
理由: Grammatical error - should be 'take them yourselves' | Ungrammatical phrasing; should be ‘help yourselves’ or similar. | 'pnums' is unclear/incorrect terminology. Should be 'accounts' or 'standard/free accounts'. Also 'take them self' is grammatically incorrect - should be 'help yourselves'. |
| google/gemma-3-4b-it | 【Community Reward】Share 2000 codex ‘普号’s, gentlemen, feel free to take them. | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: [Community Contribution] Sharing 2000 codex tokens, everyone can take them. |
严重
[准确性]
"codex tokens"
理由: Mistranslates '普号' as 'tokens'. Should be 'standard/free accounts' - more accurate meaning. | Mistranslated '普号' (standard/free accounts) as 'tokens'. These are accounts, not currency tokens. | “普号” refers to standard/free accounts, not tokens; changes what is being shared. |
| google/translategemma-4b-it | 【Community Reward】Share 2000 codex '普号', gentlemen, feel free to take them. |
严重
[术语]
"'普号'"
理由: Hypothesis translates as 'gentlemen' but should be 'standard/free accounts' - completely misses the meaning | '普号' means 'standard/free account', should be 'standard accounts' not left as '普号' | Left untranslated; should clarify as 'standard' or 'regular' accounts for broader understanding. |
| tencent/HY-MT1.5-1.8B | Login |
致命
[准确性]
"Login"
理由: Completely incorrect translation. The source is about sharing 2000 Codex accounts for the community, not login. The hypothesis is completely wrong. | Completely unrelated; ignores meaning about sharing 2000 Codex standard accounts as community giveback and support for GPT-5.4. | Complete hallucination. The source text is about sharing accounts, not a login command. | Completely incorrect translation. Should be '[Giving back to the community] Sharing 2000 Codex standard/free accounts, help yourselves bros' per reference. |
| tencent/HY-MT1.5-7B | "en": "Title: [Giving Back to the Community] Sharing 2000 Codex LinuxDo accounts for everyone to use.", "zh-TW": "Title: [Giving Back to the Community] Sharing 2000 Codex LinuxDo accounts for everyone to use.", "ja": "Title: [Community Contribution] Sharing 2000 Codex LinuxDo accounts for everyone to use.", "ko": "Title: [Community Support] Sharing 2000 Codex LinuxDo accounts for everyone to use.", "de":="Titel: [Zurückgabe an die Gemeinschaft] Bereitstellung von 2000 Codex LinuxDo-Konten für alle zur Nutzung.", "fr": "Titre: [Retour à la communauté] Partage de 2000 comptes Codex LinuxDo pour tout le monde.", "it": "Titolo: [Contributo alla comunità] Condivisione di 2000 account Codex LinuxDo per tutti."}|im_end|> |
严重
[准确性]
"Codex LinuxDo accounts"
理由: Hypothesis is a JSON object. It misinterprets '普号' (standard/free accounts) as 'LinuxDo accounts'. 'Codex' here likely refers to the model, not a user account type. | Reference says 'standard/free accounts', hypothesis incorrectly adds 'LinuxDo' to account type | “普号” means standard/free accounts, not LinuxDo accounts; LinuxDo is not mentioned in source. | Inaccurate - should be 'Codex standard/free accounts', not 'Codex LinuxDo accounts'. The reference clarifies the proper terminology. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is using Team car will affect my personal account after I upgrade to 5.4? Content: After releasing 5.4, I upgraded to Team car overnight.. I don't know if it's the problem with Team car or my US West node is not pure enough |
严重
[准确性]
"upgrade to 5.4"
理由: Misunderstanding - reference says 'joined a Team shared subscription' not 'upgraded to 5.4' | Mistranslates '大号上了team车' and misses that this is about a 'Team shared subscription'. The phrasing is awkward and inaccurate. | Original refers to 5.4 being released and joining a Team subscription that then got downgraded; "upgrade to 5.4" personalizes the upgrade and slightly shifts focus. |
| CohereLabs/tiny-aya-water | Invalid token |
致命
[准确性]
"Invalid token"
理由: Complete hallucination; the output 'Invalid token' has no relation to the source text about Team accounts and shadowbans. | Translation is missing/invalid; no usable content compared to source and reference. | Hypothesis is 'Invalid token' which is not a translation at all. Should provide an actual English translation of the Chinese title. | Complete translation failure - the hypothesis shows 'Invalid token' which indicates a system error or missing translation. The entire content is missing. |
| Qwen/Qwen2.5-14B-Instruct | Title: Is getting on a team car after 5.4 affecting personal account? Content: After posting 5.4, I got on a team car and it immediately became less intelligent... Not sure if it's due to the team car or my US West node isn't clean enough. |
严重
[术语]
"team car"
理由: Incorrect translation of 'team车' - should be 'Team shared subscription' to match reference | Literal rendering of slang; should be "Team shared subscription" or similar for clarity. | Hypothesis uses 'team car' but reference clarifies 'Team shared subscription'.
轻微
[术语]
"it immediately became less intelligent"
理由: '降智' in this context refers to model performance degradation or shadowbanning. 'Less intelligent' is a bit weak; 'shadowbanned' or 'downgraded' is more precise. | "became less intelligent" approximates "降智" but misses shadowban/downgrade nuance. | Should be 'shadowbanned/downgraded' to match reference terminology |
| Qwen/Qwen2.5-7B-Instruct | Title: Big Number joined team car and intelligence dropped, will it affect my personal account? Content: After publishing 5.4, the team car I joined had its intelligence drop overnight... Not sure if it's an issue with the team car or that my West Coast node isn't pure enough. |
严重
[准确性]
"my personal account"
理由: Original distinguishes main account vs personal workspace; ‘personal account’ is slightly ambiguous. | Missing critical terms 'shadowbanned/downgraded' and 'shared subscription'. | The source asks if it will affect the personal workspace account, not just 'personal account'.
严重
[准确性]
"Big Number joined team car and intelligence dropped"
理由: Should be 'If my main account joins a Team shared subscription and gets shadowbanned/downgraded'. The hypothesis misses the conditional nature and proper terminology. | '大号' means 'main account', not 'Big Number'. 'team车' means 'Team shared subscription', not 'team car'. The translation is completely incorrect. | Mistranslation of '大号' which means 'main account' or 'primary account', not 'Big Number'. |
| Qwen/Qwen3-14B | L'utilizzo su larga scala del veicolo di squadra che riduce l'intelligenza influenzerà il conto personale? |
轻微
[准确性]
"L'utilizzo su larga scala del veicolo di squadra"
理由: "team车" is a shared Team subscription, not a literal "veicolo di squadra" (team vehicle); core concept mistranslated. | Wrong target language - hypothesis is in Italian instead of English. Also awkward phrasing and incomplete - missing content section | The translation incorrectly translates 'team车' as 'veicolo di squadra' instead of 'Team shared subscription'. | Literal translation of 'Team 车' (Team shared subscription/car-pooling account). 'Veicolo di squadra' makes no sense in Italian; it should be 'abbonamento condiviso di Team'. |
| Qwen/Qwen3.5-4B | Title: My main account got on Team car and became less intelligent. Will this affect my personal account? Content: After publishing version 5.4, I joined the Team car overnight and it became less intelligent... Unclear whether it's an issue with Team car or if my US West node isn't pure enough. |
严重
[术语]
"Team car"
理由: Should be 'Team shared subscription' not 'Team car' | Incorrect translation of 'Team车' as 'Team car' instead of 'Team shared subscription' | Unexplained metaphor for shared Team subscription/account; unclear in English. | Should be 'Team shared subscription'. '车' (car) is slang for shared subscription/account, not literal vehicle. |
| Qwen/Qwen3.5-9B | Title: The large car on Team became less intelligent after joining; will this affect my personal account? Content: After releasing version 5.4, I joined the Team car and immediately lost intelligence... Not sure if it's an issue with the Team car or that my US West node isn't pure enough. |
轻微
[准确性]
"The large car on Team became less intelligent after joining"
理由: Misinterprets '大号上了team车降智了' which refers to main account joining a Team shared subscription and being downgraded, not 'large car'. | Should be 'If my main account joins a Team shared subscription and gets shadowbanned/downgraded'. '大号上了team车' means joining a Team shared subscription, not 'large car'. | Should be 'my main account joins a Team shared subscription and gets shadowbanned' - wrong translation of concept | Mistranslation of '大号' (main account/large account). 'Car' is a mistranslation of '车' (often used for shared subscriptions/groups), but 'large car' is nonsensical here. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Has a big name joined Team car and will it affect personal account? |
轻微
[准确性]
"Has a big name joined Team car and will it affect personal account?"
理由: Completely misinterpreted '大号' (main account) as 'big name' and 'team 车' (team shared subscription) as 'Team car'. | Misreads “大号” as ‘big name’ rather than main account, and “上了team车降智了会影响个人账户吗” as a question about someone joining, instead of whether the main account getting downgraded on a Team subscription affects the personal account. | Completely misinterprets source. '大号' means 'main account', not 'big name'. '上了team车' means 'joined Team shared subscription'. Should ask if shadowban on Team account affects personal workspace. Missing content section. | Incorrect translation - '大号' means 'main account' not 'big name'. 'team车' means 'Team shared subscription' not 'Team car'. Also missing '降智了' (shadowbanned) in title. |
| google/gemma-3-12b-it | Title: Does being on a large team car and lowering intelligence affect my personal account? Content: After releasing 5.4, I was on a large team car and lowered intelligence overnight... I'm not sure if it's a problem with the team car or if my West US node isn't clean enough. |
严重
[准确性]
"being on a large team car"
理由: Should be 'Team shared subscription' not 'large team car' - 'team车' is slang for team/shared subscription, not a literal car | Misinterprets ‘大号上了team车’ which refers to main account joining a Team subscription, not a ‘large team car’ literalization. | Literal translation of 'Team 车'. '车' here is slang for 'shared subscription' or 'group buy', not a vehicle. | Mistranslation of '大号上了team车' - should be 'If my main account joins a Team shared subscription and gets shadowbanned/downgraded' |
| google/gemma-3-1b-it | Translation of '大号上了team车降智了会影响个人账户吗' in en |
致命
[准确性]
"Translation of '大号上了team车降智了会影响个人账户吗' in en"
理由: The hypothesis is a meta-commentary instruction rather than a translation. It fails to translate the Title or the Content. | Meta-description instead of an actual translation; omits the question and the explanation about Team shared subscription, downgrade, and node cleanliness. | Hypothesis is not a translation but a meta-description. Should provide actual English translation of the title and content. | Missing content translation. | Not a translation - just describes that it's a translation request without providing actual translation |
| google/gemma-3-4b-it | Title: Large team car drop has affected personal account. Content: After releasing 5.4, my team car dropped significantly overnight... I'm not sure if it's a team car issue or if my Western US node isn't pure enough. |
严重
[准确性]
"Large team car drop has affected personal account."
理由: Literal mistranslation of '大号上了 team 车'. '大号' means 'main account', '上了...车' means 'joined a shared subscription/group buy'. 'Team car' is nonsensical. | Misinterprets title; actual question is whether joining a Team that gets downgraded will affect personal account, not stating it has affected it | Mistranslation. Should be 'If my main account joins a Team shared subscription and gets shadowbanned/downgraded, will it affect my personal workspace account?' The hypothesis misses the question structure. |
| google/translategemma-12b-it | Title: Does being on a large team server and experiencing a drop in speed affect my personal account? |
严重
[准确性]
"large team server"
理由: Completely mistranslates the question. Should be about Team shared subscription and shadowban/downgrade effects, not 'drop in speed'. | Misreads “大号” as large server; it means main account. Also omits “Team shared subscription” nuance. | Reference says 'If my main account joins a Team shared subscription and gets shadowbanned/downgraded' - not about server speed | Mistranslated '大号上了 team 车'. '大号' means 'main account', not 'large server'. 'Team 车' refers to a shared subscription group ('Team shared subscription'), not a physical server. |
| google/translategemma-4b-it | Title: Large team car suddenly dropped performance, will it affect my personal account? |
致命
[准确性]
"Large team car"
理由: Literal translation of '大号上了 team 车'. '大号' means 'main account', '车' means 'shared subscription/group buy'. 'Large team car' is nonsensical. | Completely wrong translation - '大号' means 'main account', not 'large team car' | Should be 'main account joins a Team shared subscription' | '大号上了team车' refers to main account joining a Team shared subscription, not 'large team car'; literal slang mistranslation. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '大号上了team车降智了影响了个人账户吗'", "zh-TW": "翻译为‘大號上線的Team車讓我的智商下降了,這會影響個人賬戶嗎’", "ja": "翻译为‘大號上線的Team車讓我的智慧降低了,這會影響個人帳戶嗎’", "ko": "翻译为‘대상이 팀 차를 이용해 지능을 낮추었는데, 이가 개인 계정에 영향을 미칠까?’", "de": "翻译为‘Der große Fahrer hat mit dem Team-Fahrzeug die Intelligenz gesenkt und das könnte den persönlichen Account beeinflussen’,", "fr": "翻译为‘Le grand pilote a réduit l'intelligence avec le véhicule de l'équipe et cela pourrait affecter le compte personnel’, "it": "翻译为‘Il ha ridotto l'intelligenza con il veicolo della squadra e questo potrebbe influenzare il conto personale’”} | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: Will Using the Large Team Vehicle Reduce Intelligence and Affect Personal Accounts?", "zh-TW": "标题:使用大型团队车辆会降低智力并影响个人账户吗?", "ja": "标题:使用大型团队车辆会降低智力并影响个人账户吗?", "ko": "标题:使用大型团队车辆会降低智力并影响个人账户吗?", "de":="Titel: Wird die Verwendung des großen Team-Fahrzeugs Intelligenz verringern und persönliche Konten beeinträchtigen?", "fr": "Titre : L’utilisation du grand véhicule d’équipe réduira-t-elle l’intelligence et affectera-t-elle les comptes personnels ?", "it": "Titolo: L’uso del grande veicolo di squadra ridurrà l’intelligenza e influenzerà i conti personali?"} |
严重
[准确性]
"Large Team Vehicle / 大型团队车辆"
理由: Reference means 'Team shared subscription', not literal 'team vehicle' | Completely wrong metaphor; 车 refers to shared subscription/account, not vehicle, and “大号” is main account, not large account. | Literal translation of slang '车' (group buy/shared ride/subscription). | Completely mistranslates the meaning - should be about 'Team shared subscription' and 'shadowbanned/downgraded', not about 'large team vehicle' reducing intelligence. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is there any difference between the performance of the gpt 5.4 web version and the gpt 5.2 web version? |
严重
[准确性]
"Is there any difference between the performance of the gpt 5.4 web version and the gpt 5.2 web version?"
理由: Completely misses the key distinction in the title about '降智还是模型路由' (shadowbanning/downgrade vs model routing). The hypothesis title is too generic and loses critical meaning. | Omits core concept of downgrade/shadowban vs model routing and the specific comparison of tool-calling and reasoning differences between 5.4 and 5.2. | Missing the key distinction - reference specifically mentions tool calling and reasoning differences between versions | The source specifically asks about '降智' (downgrade/lobotomy) vs 'model routing'. The hypothesis generalizes this to 'any difference in performance', losing the specific technical concern about intentional degradation. |
| CohereLabs/tiny-aya-water | Title: Is there any difference between the performance of the gpt-5.4 and gpt-5.2 models? |
严重
[准确性]
"Is there any difference between the performance of the gpt-5.4 and gpt-5.2 models?"
理由: Over-generalized the title, losing specific details about 'shadowbanned/downgraded', 'model routing', and 'tool calling/reasoning'. | Overgeneralizes; source is specifically about web version downgrade/model routing, tool use, and reasoning differences, not generic performance comparison. | Loses critical context about '降智' (shadowbanning/downgrade) and '模型路由' (model routing). The translation is too generic and misses the specific technical question being asked. | Incorrect translation - the source asks about '降智还是模型路由' (shadowban/downgrade vs model routing), not just 'difference in performance'. Also missing Content translation. |
| Qwen/Qwen2.5-14B-Instruct | La versione web di GPT 5.4 è depotenziata o routing del modello? Confronto con GPT 5.2 |
致命
[准确性]
"La versione web di GPT 5.4 è depotenziata o routing del modello? Confronto con GPT 5.2"
理由: Completely wrong language - output is in Italian instead of English | Hypothesis is in Italian instead of English translation. Should be in English like reference. | The hypothesis is translated into Italian instead of the target language (English/Chinese context implies English output based on other items, or at least consistent language). The source is Chinese, target language specified as 'zh' in prompt header but the task is evaluating translation TO English (based on hypothesis language). Wait, the prompt says Target Language: zh, but the Source is zh and Hypothesis is English/Italian. This is a translation FROM zh TO en task disguised. The hypothesis for item 18 is in Italian, which is wrong language entirely. | Entire hypothesis is in Italian instead of target language (zh) and omits explicit mention of tool calling and reasoning comparison from the source. |
| Qwen/Qwen2.5-7B-Instruct | Title: GPT 5.4 Web Version Dumber or Model Routing? Compared to GPT 5.2 Content: GPT 5.4 Model Q&A No tool calling and thinking GPT 5.2 Model Q&A With tool calling and thinking |
轻微
[流畅性]
"No tool calling and thinking / With tool calling and thinking"
理由: Missing articles and connectors; slightly awkward English but understandable. | Sentence fragment style is acceptable for notes, but could be smoother. | Should be 'No tool calling and reasoning' to match reference. |
| Qwen/Qwen3-14B | Title: Is GPT 5.4 Web Version Dumb or Model Routing? Comparison with GPT 5.2 Content: GPT 5.4 model QA has no tool calling and thinking; GPT 5.2 model QA has tool calling and thinking | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: gpt 5.4 Web version nerfed vs model routing, comparing gpt 5.2 Content: gpt 5.4 model Q&A has no tool calling and reasoning; gpt 5.2 model Q&A includes tool calling and reasoning |
轻微
[术语]
"nerfed"
理由: Incorrect translation of '降智' as 'nerfed' instead of 'shadowbanned/downgraded' | Should be 'shadowbanned/downgraded' not 'nerfed' | Captures downgrade but omits the ‘shadowban’ nuance in the reference phrasing. | While 'nerfed' is understood, 'shadowbanned/downgraded' is more precise for the specific context of model quality reduction discussed in the source. | Should be 'shadowbanned/downgraded'. '降智' is specific technical slang for performance degradation in this context. |
| Qwen/Qwen3.5-9B | Title: Is GPT 5.4 web version nerfed or model routing? Compared with GPT 5.2 Content: GPT 5.4 model Q&A lacks tool invocation and reasoning; GPT 5.2 model Q&A includes tool invocation and reasoning. |
轻微
[术语]
"nerfed"
理由: Reference suggests 'shadowbanned/downgraded' for '降智'. | Should be 'shadowbanned/downgraded' not 'nerfed' | Should be 'shadowbanned/downgraded'. '降智' refers to model degradation, not gaming nerfs. | Should be translated as 'shadowbanned/downgraded' instead of 'nerfed'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: GPT 5.4 Web Version Downgraded vs Model Routing Comparison with GPT 5.2 Content: GPT 5.4 Model Q&A Without Tool Invocation and Thinking GPT 5.2 Model Q&A With Tool Invocation and Thinking |
轻微
[流畅性]
"Model Q&A Without Tool Invocation and Thinking / With Tool Invocation and Thinking"
理由: Content formatting is awkward with line breaks. Should be formatted as continuous text or properly structured. Minor issue as meaning is preserved. | Slightly awkward phrasing but meaning preserved. | Slightly choppy, could be smoother like 'No tool calling and reasoning'. |
| google/gemma-3-12b-it | Title: GPT 5.4 Web Version Nerfed or Model Routing, Comparison with GPT 5.2 Content: GPT 5.4 model Q&A has no tool calling and thinking; GPT 5.2 model Q&A has tool calling and thinking. | 通过 (无共识错误) |
| google/gemma-3-1b-it | The question of whether GPT-5.4 is 'decreased' or 'shadow banned' compared to GPT-5.2. |
严重
[准确性]
"The question of whether GPT-5.4 is 'decreased' or 'shadow banned' compared to GPT-5.2."
理由: Only reflects the title notionally and omits body details about lack/presence of tool calling and reasoning for 5.4 vs 5.2. | The hypothesis only provides a partial title translation. It completely omits the Content which details the difference in tool calling and reasoning between versions. | Incomplete translation - only translates the title, not the content which provides the actual comparison data. Also 'decreased' is awkward - should be 'downgraded' or 'shadowbanned'. |
| google/gemma-3-4b-it | Title: gpt 5.4 web version downgraded or model routing, comparison with gpt 5.2 Content: gpt 5.4 model question answering has no tool calling and thinking; gpt 5.2 model question answering has tool calling and thinking | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Is gpt 5.4 web version nerfed or is it model routing? Comparison with gpt 5.2. Content: gpt 5.4 model answers without tool calls and thinking. gpt 5.2 model answers with tool calls and thinking. | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: gpt 5.4 web version downgraded or model routing, comparison of gpt 5.2 |
轻微
[准确性]
"gpt 5.4 web version downgraded or model routing"
理由: Missing 'shadowbanned' in the translation. | Incomplete translation; the Content section detailing the difference in tool calling and reasoning is missing. | Conveys main idea but omits 'shadowbanned' nuance; still close to reference. |
| tencent/HY-MT1.5-1.8B | "en": "The title 'gpt 5.4 网页版降智还是模型路由,对比 gpt 5.2' describes a comparison between the GPT 5.4 web version and the GPT 5.2 model. It appears to be about determining whether the GPT 5.4 web version has reduced intelligence or if there is a routing issue with the model.", "zh-TW": "標題「GPT 5.4網頁版是降智還是模型路由,比較GPT 5.2」描述了對GPT 5.4網頁版與GPT 5.2模型的區別。似乎是在討論GPT 5.4網頁版是否降低了智能程度,或者模型中是否存在路由問題。", "ja": "タイトル「GPT 5.4ウェブ版は降智か、モデルのルーティングか?GPT 5.2と比較して」は、GPT 5.4ウェブ版とGPT 5.2モデルの違いについて述べています。どうやらGPT 5.4ウェブ版のインテリジェンスが低下したのか、またはモデルにルーティングの問題があるのかを比較しているようです。", "ko": "제목은 ‘GPT 5.4 웹 버전이 지능을 낮추는가, 아니면 모델의 라우팅 문제인가? GPT 5.2와 비교해 보자’입니다. 이는 GPT 5.4 웹 버전과 GPT 5.2 모델 간의 차이를 설명하는 내용입니다. 아마도 GPT 5.4 웹 버전의 지능이 감소했거나 모델에 라우팅 문제가 있는지를 비교하고자 하는 것 같습니다.", "de": "제목은 ‘GPT 5.4 웹버전은 지능을 낮추나요, 아니면 모델의 라우팅 문제인가요? GPT 5.2와 비교해보세요’입니다. 이는 GPT 5.4 웹버전과 GPT 5.2 모델의 차이점을 설명하는 내용입니다. 아마도 GPT 5.4 웹버전의 지능이 감소했거나 모델에 라우팅 문제가 있다고 생각하고 있습니다. ", "it": "제목은 ‘GPT 5.4 웹 버전은 지능을 낮추나요, 아니면 모델의 라우팅 문제인가요? GPT 5.2와 비교해보세요’입니다. 이는 GPT 5.4 웹버전과 GPT 5.2 모델의 차이점을 설명하는 내용입니다. 아마도 GPT 5.4 웹버전의 지능이 감소했거나 모델에 라우팅 문제가 있다고 생각하고 있습니다. ", "fr": "제목은 ‘GPT 5.4 웹 버전은 지능을 낮추나요, 아니면 모델의 라우팅 문제인가요? GPT 5.2와 비교해보세요’입니다. 이는 GPT 5.4 웹버전과 GPT 5.2 모델의 차이점을 설명하는 내용입니다. 아마도 GPT 5.4 웹버전의 지능이 감소했거나 모델에 라우팅 문제가 있다고 생각하고 있습니다. ", "de": "제목은 ‘GPT 5.4 웹 버전은 지능을 낮추나요, 아니면 모델의 라우팅 문제인가요? GPT 5.2와 비교해보세요’입니다. 이는 GPT 5.4 웹버전과 GPT 5.2 모델의 차이점을 설명하는 내용입니다. 아마도 GPT 5.4 웹버전의 지능이 감소했거나 모델에 라우팅 문제가 있다고 생각하고 있습니다. ", "it": "제목은 ‘GPT 5.4 웹 버전은 지능을 낮추나요, 아니면 모델의 라우팅 문제인가요? GPT 5.2와 비교해보세요’입니다. 이는 GPT 5.4 웹버전과 GPT 5.2 모델의 차이점을 설명하는 내용입니다. 아마도 GPT 5.4 웹버전의 지능이 감소했거나 모델에 라우팅 문제가 있다고 생각 | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | Translation of '助手' in en |
致命
[准确性]
"Translation of '助手' in en"
理由: Hypothesis is incomplete and nonsensical - it only says 'Translation of 助手 in en' without providing actual translation. Should translate the full title and content. | Incorrect translation. | Completely unrelated to source; does not translate title or content at all. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Is there anyone who can explain simply how ordinary people can apply AI or say use Lóngxiā (Linux Do) |
严重
[准确性]
"Is there anyone who can explain simply how ordinary people can apply AI or say use Lóngxiā (Linux Do)"
理由: Content missing - reference includes detailed content about AI anxiety and普通人 (ordinary people) using OpenClaw | Incorrect localization. '龙虾' is the tool 'OpenClaw'. 'Linux Do' is the name of the forum/community, not the tool itself. Equating them directly in parentheses is confusing. | Keeps pinyin 'Lóngxiā' instead of translating to 'OpenClaw'. Also incorrectly identifies it as 'Linux Do' in parentheses, which is confusing and inaccurate. |
| CohereLabs/tiny-aya-water | Title: Is there anyone who can explain simply how ordinary people can apply AI or say use Lóngxiā (LinuxDo) ? |
严重
[准确性]
"Lóngxiā (LinuxDo)"
理由: "龙虾" here refers to OpenClaw app, not "LinuxDo" generally; addition of "(LinuxDo)" is misleading. | Failed to translate '龙虾' (OpenClaw) correctly, using Pinyin and incorrectly associating it with LinuxDo. | Mistranslates '龙虾' as 'Lóngxiā (LinuxDo)' with pinyin and incorrect parenthetical. Should be 'OpenClaw'. Also awkward phrasing with 'or say use'. |
| Qwen/Qwen2.5-14B-Instruct | Title: Can someone explain in simple terms how ordinary people can apply AI or use lobster? Content: So I'm wondering if anyone has the same worries about how fast AI is developing and how ordinary people can integrate themselves and use AI beyond just simple searches. Can lobsters be utilized by ordinary people to achieve unexpected functions? | 通过 (无共识错误) |
| Qwen/Qwen2.5-7B-Instruct | Is there anyone who can explain AI to ordinary people in an easy-to-understand way? Or how can ordinary people apply or use AI? |
严重
[准确性]
"Is there anyone who can explain AI to ordinary people in an easy-to-understand way? Or how can ordinary people apply or use AI?"
理由: Hypothesis only provides partial title translation and misses the full context. Should include the reference to 'OpenClaw' and the complete question structure. | Missing critical term 'OpenClaw'. | Failed to translate the specific mention of '龙虾' (OpenClaw) in the title and omitted the entire content section discussing anxiety and specific use cases. | The translation omits the Content entirely. The source has extensive Content about anxiety regarding AI development and using OpenClaw (龙虾) as a tool. | Translation only covers first part of title; omits mention of ‘or use OpenClaw’ and the body content about anxiety, AI pace, integration, and OpenClaw’s potential. |
| Qwen/Qwen3-14B | Qualcuno può spiegare in modo semplice come i cittadini comuni possano applicare l'IA o utilizzare i gamberi? |
严重
[准确性]
"Qualcuno può spiegare in modo semplice come i cittadini comuni possano applicare l'IA o utilizzare i gamberi?"
理由: Mistranslated the slang '龙虾' (OpenClaw) as literal 'gamberi' (shrimps/prawns). | Wrong target language - hypothesis is in Italian. Also mistranslates '龙虾' (OpenClaw) as 'gamberi' (shrimp) | The translation incorrectly translates '龙虾' as 'gamberi' instead of 'OpenClaw'. |
| Qwen/Qwen3.5-4B | Title: Can anyone explain simply how ordinary people can apply AI or use Lobster? Content: So I'd like to ask fellow friends if you share this anxiety. With AI developing so rapidly, how can ordinary people integrate and utilize it? Beyond simple search, can Lobster be used by regular people to achieve unexpected functions? |
严重
[术语]
"Lobster"
理由: Should be 'OpenClaw' not 'Lobster' throughout | Does not preserve the community-specific product name OpenClaw for “龙虾”. | Incorrect translation of '龙虾' as 'Lobster' instead of 'OpenClaw' |
| Qwen/Qwen3.5-9B | Title: Can anyone explain in simple terms how ordinary people can apply AI or use LinuxDo? Content: So I want to ask everyone here if you share the same anxiety. With AI developing so rapidly nowadays, how can regular people integrate themselves and utilize AI? Beyond simple search, can LinuxDo be used by ordinary people to unleash unexpected capabilities? |
严重
[术语]
"LinuxDo"
理由: 龙虾 here denotes OpenClaw; using LinuxDo is incorrect in this discussion context. | Should be 'OpenClaw' not 'LinuxDo' throughout | Should be translated as 'OpenClaw' instead of 'LinuxDo'.
致命
[准确性]
"LinuxDo"
理由: Should be 'use OpenClaw' | Should be 'OpenClaw'. '龙虾' is slang for OpenClaw software, not LinuxDo. | Incorrect entity translation. '龙虾' refers to 'OpenClaw'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Is there anyone who can explain to ordinary people how to apply AI or say use '龙虾' in a simple way Content: So I wanted to ask you all, have any of you had the same anxiety? With AI developing so fast, how can ordinary people integrate and use AI beyond just simple searches? Is '龙虾' something that ordinary people can use and achieve unexpected functions? |
严重
[术语]
"'龙虾'"
理由: Should use product name OpenClaw in this context rather than literal ‘龙虾’ to match established terminology. | Failed to translate the slang '龙虾' to 'OpenClaw', leaving it in Chinese characters within an English sentence. | Incorrect translation of '龙虾' - should be 'OpenClaw' not '龙虾'. Also '小白' means 'newbie' not 'ordinary people'. |
| google/gemma-3-12b-it | Title: Is there anyone who can explain in simple terms how ordinary people can apply AI or use Lobster? Content: So I wanted to ask you all if you have the same anxiety. With the rapid development of AI, how can ordinary people integrate and use AI? Besides simple searches, can Lobster be used by ordinary people to achieve unexpected functions? | 通过 (无共识错误) |
| google/gemma-3-1b-it | So, I'm wondering if anyone else has the same anxiety about AI development so fast. How can ordinary people integrate with AI? Besides simple searches, is lobster not something that ordinary people can use and realize unexpected functionality? |
严重
[术语]
"is lobster not something that ordinary people can use"
理由: Incorrect translation of '龙虾' - should be 'OpenClaw' not 'lobster' | '龙虾' is OpenClaw, not literal lobster; misuses key term. | Failed to translate the slang '龙虾' to 'OpenClaw', using the literal 'lobster'. |
| google/gemma-3-4b-it | Does anyone have a simple explanation of how ordinary people can apply AI or use Lobster? I'm feeling anxious about the rapid development of AI and how ordinary people can integrate and use AI beyond just basic searches. Could Lobster be used by ordinary people to unleash unexpected functions? | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Can anyone explain in simple terms how ordinary people can apply AI or use "Lobster"? Content: So I want to ask you all experienced users, do you have the same concerns? With the rapid development of AI, how can ordinary people integrate and use AI? Besides simple searches, can "Lobster" be used by ordinary people to achieve unexpected functions? | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Does anyone have a simple explanation for how ordinary people can use AI or 'Lobster'? Content: So I was wondering if anyone else has similar concerns. With AI developing so rapidly, how can ordinary people integrate and use AI beyond basic search? Can 'Lobster' be used by ordinary people to unleash unexpected functions? | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: The hypothesis is a meta-instruction referencing a word ('登录') not present in the source title, and fails to provide the actual translation. | Hypothesis is completely wrong. '登录' means 'login' but this is not the title. Should translate the actual title 'Can anyone explain in simple terms how ordinary people can apply AI, or rather, use OpenClaw?' per reference. | Hypothesis is not a translation of the source content. The source is about how ordinary people can use AI/OpenClaw, but the hypothesis translates '登录' (login) which doesn't appear in the source at all. | Translates unrelated word '登录' (login) instead of the provided title about applying AI/using OpenClaw; entirely misses source meaning. |
| tencent/HY-MT1.5-7B | "en": "Title: Can someone explain in simple terms how ordinary people can use AI or, in other words, how to apply ‘Linux’?", "zh-TW": "Title: 能有人用简单的方式解释一下普通人在如何运用AI,或者换句话说,如何使用‘Linux’吗?", "ja": "Title: 簡単に説明してもらえますか?一般の人がどのようにAIを利用できるのか、あるいは言い換えれば『Linux』の使い方です。", "ko": "Title: 일반인이 어떻게 AI를 활용할 수 있는지, 다시 말해 ‘Linux’를 사용하는 방법에 대해 간단히 설명해 주실 수 있나요?", "de":="Titel: Kann jemand einfach erklären, wie normale Menschen AI nutzen können oder anders ausgedrückt, wie man ‘Linux’ verwendet?", "fr":="Titre : Peut-on expliquer simplement comment les gens ordinaires peuvent utiliser l’IA ou, pour reprendre les mots de ‘Linux’, comment l’utiliser ?", "it":="Titolo: Qualcuno può spiegare in modo semplice come le persone comuni possano utilizzare l’IA o, per usare le parole di ‘Linux’, come farne uso?"}|im_end|> | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: I want to raise another homemade shrimp, any recommendations for good value-for-money server models? |
严重
[准确性]
"homemade shrimp"
理由: Mistranslates '养龙虾' as 'raise shrimp' literally. Should be 'host OpenClaw'. Also misses '国产' (domestic) in the context of models. | Should be 'host another OpenClaw using domestic models' - missing context about server and model recommendations | Mistranslation of '国产龙虾'. '国产' means 'domestic' (Chinese-made models). 'Homemade shrimp' sounds like cooking. '龙虾' here is the tool 'OpenClaw'. | "国产龙虾" means OpenClaw using domestic models, not literal shrimp; loses AI/tool meaning. |
| CohereLabs/tiny-aya-water | Title: I want to raise another homemade shrimps, any recommendations for good value-for-money server models? |
致命
[准确性]
"homemade shrimps"
理由: Literal translation of '国产龙虾' (domestic OpenClaw/models) as 'homemade shrimps' loses the technical meaning. | Mistranslates '龙虾' as 'shrimps' instead of 'OpenClaw'. Also mistranslates '国产' (domestic) as 'homemade'. The metaphor of 'raising/hosting' is lost. | Literal and number mismatch; should be singular and refer to hosting another OpenClaw instance with domestic models, not actual shrimp. |
| Qwen/Qwen2.5-14B-Instruct | Title: Want to raise another domestic lobster, recommend machine model! Content: I want to raise another domestic one. Are there any recommended high cost-performance ratio machines/servers? Also, are there any good models that are more user-friendly? |
严重
[准确性]
"raise another domestic one"
理由: Reinforces incorrect literal interpretation of "养" (host/run) as "raise" an animal instead of hosting another instance. | Literal translation of '国产龙虾' (domestic OpenClaw/models). Implies raising actual lobsters. | Should be 'host another OpenClaw using domestic models' to match reference meaning |
| Qwen/Qwen2.5-7B-Instruct | Title: Want to Raise Another Domestic Lobster? Seeking Recommended Machine Models! Content: I want to raise another domestic one. Do you have any recommendations for cost-effective machine servers and models that are more practical? Fellow lobster enthusiasts, please give me some advice. |
严重
[术语]
"Domestic Lobster"
理由: Failed to translate the slang '龙虾' (OpenClaw), using the literal animal name throughout. | '龙虾' is slang for OpenClaw, not literal lobster. Should be 'OpenClaw' throughout. | Here 国产龙虾 refers to hosting OpenClaw with domestic models; ‘domestic lobster’ literalization is incorrect.
严重
[准确性]
"Want to Raise Another Domestic Lobster?"
理由: Should be 'Want to host another OpenClaw using domestic models'. The hypothesis uses 'Lobster' instead of 'OpenClaw' and misses the 'domestic models' context. | Misinterpretation. '国产龙虾' means 'OpenClaw running on domestic (Chinese) models/servers', not a domestic species of lobster. | Missing critical terms 'host' and 'OpenClaw'. |
| Qwen/Qwen3-14B | Vorrei allevare un altro granchio locale, ci sono raccomandazioni per i modelli di macchine? |
轻微
[准确性]
"modelli di macchine"
理由: Only mentions machine models, omitting explicit reference to AI models that source also asks recommendations for. | Ambiguous translation. Source asks for 'machine server' and 'model'. 'Modelli di macchine' sounds like 'car models' or 'machine learning models' ambiguously, missing the server hardware aspect. | Wrong target language - hypothesis is in Italian. Also mistranslates '龙虾' (OpenClaw) as 'granchio' (crab) and misses the context about domestic models |
| Qwen/Qwen3.5-4B | Title: Want to raise another domestic lobster, recommend machine models! Content: I want to raise one more domestically produced one. Do you have any recommended cost-effective machine servers? Also, do you have some cost-effective and easy-to-use models? Lobster-raising friends, please share your opinions. |
致命
[准确性]
"raise another domestic lobster"
理由: Failed to translate '龙虾' (OpenClaw). The phrase 'domestic lobster' sounds like raising actual animals. | “养个国产龙虾” here means hosting another OpenClaw instance using domestic models, not literally raising lobsters. | Should be 'OpenClaw'. '龙虾' is slang for OpenClaw software, not literal lobsters. |
| Qwen/Qwen3.5-9B | Title: Want to raise another domestic crayfish, looking for recommended machine models! Content: I want to adopt another domestic one. Any recommendations for cost-effective server machines and some good value-for-money models? Friends who keep crayfish, please share your opinions. | 通过 (无共识错误) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | I want to keep another domestic shrimp, do you have any recommendations for machine models that have good cost performance? Are there any models that are both cost-effective and easy to use? My shrimp-keeping friends, please share your opinions. |
严重
[术语]
"domestic shrimp"
理由: Literal translation of '国产龙虾' (domestic OpenClaw/models) as 'domestic shrimp'. | Literal translation of 国产龙虾; should be “domestic OpenClaw (deployment/models)” not actual shrimp. | Incorrect translation of '龙虾' - should be 'OpenClaw' not 'shrimp'. '养龙虾' means 'host OpenClaw' not 'keep shrimp'. |
| google/gemma-3-12b-it | Title: Thinking about raising another domestic lobster, looking for model recommendations! |
致命
[准确性]
"domestic lobster"
理由: Literal translation of '国产龙虾'. Refers to 'domestic models running OpenClaw' or 'Chinese models', not raising actual lobsters. | Refers to hosting OpenClaw with domestic models; ‘domestic lobster’ is a literal mistranslation that obscures the meaning. | '龙虾' should be 'OpenClaw', not 'lobster'. Also missing content translation. |
| google/gemma-3-1b-it | Recommend some high-value Chinese lobster farming machines. |
轻微
[准确性]
"Recommend some high-value Chinese lobster farming machines."
理由: Completely inadequate translation. Missing the title context about wanting to host another OpenClaw using domestic models. '龙虾' should be 'OpenClaw', not 'lobster'. Should include both title and content translation. | Misreads ‘养国产龙虾’ and ‘机器服务器’ about hosting OpenClaw with domestic models; turns it into literal lobster farming machinery. | Completely wrong meaning - reference is about servers and AI models, not farming machines | Severe mistranslation. The source asks for server and model recommendations for hosting 'OpenClaw' (slang: 龙虾). The hypothesis interprets this as literal 'lobster farming machines'. It also omits the second sentence of the content. |
| google/gemma-3-4b-it | I want to raise another domestic lobster, please recommend machine models! |
严重
[准确性]
"please recommend machine models!"
理由: The hypothesis only translates the title and omits the 'Content' section asking for server and model recommendations. | Mistranslation of '龙虾' - should be 'OpenClaw', not 'lobster'. Also incomplete - missing the content about servers and models. | Title only; omits details about cost-effective servers and models from content |
| google/translategemma-12b-it | Title: Thinking about raising another domestic lobster, looking for recommended machine models! Content: I want to raise another one domestically. Are there any cost-effective server recommendations? Also, are there any good and affordable models? Experienced lobster farmers, please give me some advice. |
严重
[准确性]
"I want to raise another one domestically."
理由: Mistranslates '龙虾' as 'lobster'. Should be 'OpenClaw' - the specific tool. Also '养' means 'host' not 'raise'. | Reference says 'Want to host another OpenClaw using domestic models' - 'host' not 'raise' | Misses key idea “using domestic models”; implies location-based hosting rather than model origin. |
| google/translategemma-4b-it | Title: Looking to raise more domestic lobsters, recommend machine models! | 通过 (无共识错误) |
| tencent/HY-MT1.5-1.8B | Translation of '想再养个国产龙虾,求推荐机器模型!' in English |
致命
[准确性]
"Translation of '想再养个国产龙虾,求推荐机器模型!' in English"
理由: The hypothesis is a meta-instruction describing the task rather than providing the actual translation. | Hypothesis does not provide actual translation. Should be 'Want to host another OpenClaw using domestic models, looking for server and model recommendations!' per reference. | Meta description only, no actual translation. Should be like 'Want to host another OpenClaw using domestic models, looking for server and model recommendations!'. | Hypothesis is not an actual translation but a placeholder indicating the task. It fails to provide the translated content. |
| tencent/HY-MT1.5-7B | "en": "Title: Want to Raise Another Domestic Lobster? Looking for Recommended Machine Models!", "zh-TW": "标题:想再养一只国产龙虾?寻求推荐的机器模型!", "ja": "タイトル:もう一匹国産ロブスターを飼いたいですか?おすすめのマシンモデルを探しています!", "ko": "제목: 또 다른 국산 로브스터를 키우고 싶으신가요?추천할 만한 마시ン 모델이 있을까요?", "de":="Titel: Möchten Sie noch einen inländischen Hummer züchten? Suchen Sie nach empfohlenen Maschinenmodellen?", "fr":="Titre : Souhaiteriez-vous élever un autre homard national ? Cherchez-vous des modèles de machines recommandés ?", "it":="Titolo: Volete allevare un altro gambero nazionale? Cercate modelli di macchine consigliati?"} |
严重
[准确性]
"Domestic Lobster / 龙虾"
理由: Mistranslation of '国产龙虾' - should be 'OpenClaw using domestic models', not 'Domestic Lobster'. The hypothesis provides multiple language versions when only English was requested. | Literal translation of slang. '养' in this context means 'host' or 'deploy', and '龙虾' is 'OpenClaw'. | Misreads 龙虾 as literal lobsters instead of OpenClaw instance on domestic models. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Content: ‘Intelligence quotient’ is only 20 but there’s 512 in Codex. Is not a shrimp using full power of GPT possible? |
严重
[准确性]
"Is not a shrimp using full power of GPT possible?"
理由: Mistranslates “龙虾” (OpenClaw product/name) as literal shrimp and changes the rhetorical question structure, reducing clarity and correctness. | Source refers to '龙虾' (OpenClaw/lobster), not shrimp. Incorrect terminology | Mistranslation of '龙虾' (lobster) as 'shrimp'. In this context, 'lobster' refers to the specific project name 'OpenClaw' (open+claw), whereas shrimp is a different creature. | Incorrect translation: 'lobster' (龙虾) was mistranslated as 'shrimp', and the entity 'OpenClaw' was omitted entirely
严重
[流畅性]
"Is not a shrimp using full power of GPT possible?"
理由: Awkward phrasing compared to the reference | The sentence structure is awkward and unnatural. The reference 'Can't OpenClaw utilize...' is much more fluent. | Awkward and ungrammatical phrasing. Should be 'Can't OpenClaw utilize the full-intelligence GPT?' or similar |
| CohereLabs/tiny-aya-water | Content: The ‘intelligence quotient’ is only 20 but there’s 512 in Codex. Isn’t that enough for a full-intelligent GPT? |
严重
[准确性]
"Isn’t that enough for a full-intelligent GPT?"
理由: Mistranslation: '智力水平' (intelligence level) is incorrectly translated to 'intelligence quotient'; '龙虾' (OpenClaw) is omitted; '满智力的 GPT' (full-intelligence GPT) is incorrectly rendered as 'full-intelligent GPT' | The hypothesis translates the content but completely omits the Title translation. Additionally, it mistranslates '龙虾' (Lobster/OpenClaw) as implicit or missing, changing the subject of the question compared to the source and reference. | Misinterprets rhetorical question; original asks whether OpenClaw can use full-intelligence GPT, not whether 512 is enough. | Mistranslates '龙虾就用不上满智力的 GPT吗' which means 'Can't OpenClaw utilize the full-intelligence GPT?' The hypothesis changes the meaning to a question about sufficiency rather than capability. |
| Qwen/Qwen2.5-14B-Instruct | Title: Has your openclaw degraded after integrating with GPT? Content: The 'intelligence level' is only 20 but it's 512 in codex. Does that mean lobsters can't use the full-intelligence GPT? |
严重
[准确性]
"lobsters can't use"
理由: Mistranslation of '龙虾就用不上'. Should be 'OpenClaw can't utilize' or similar, referring to the service/product, not literal lobsters. Reference correctly identifies this as 'OpenClaw'. | Mistranslation of context. '龙虾' (Lobster) is a nickname for the software 'OpenClaw', not the animal. Translating it as 'lobsters' loses the intended meaning. | Incorrect use of 'lobsters' instead of 'OpenClaw' |
| Qwen/Qwen2.5-7B-Instruct | Title: Has your openclaw's接入 to GPT caused a downgrade? Content: "Intellectual level" is only 20 while it has 512 in codex. Don't lobsters deserve a fully intelligent GPT? |
严重
[准确性]
"Don't lobsters deserve a fully intelligent GPT?"
理由: Misinterpretation of '龙虾' (OpenClaw, a product name) as literal 'lobsters'; should be 'OpenClaw' not 'lobsters' | Misinterpretation - '难道龙虾就用不上满智力的GPT吗' means 'can't OpenClaw use full-intelligence GPT' not 'don't lobsters deserve' - 'lobster' is a nickname for OpenClaw, not literal | Paraphrases "龙虾就用不上" as a rhetorical question about what lobsters deserve; keeps meaning roughly but slightly shifts nuance from capability to entitlement. | Mistranslation of slang: '龙虾' (Lobster) refers to the specific project 'OpenClaw' or its community members, not the crustacean. Translating literally loses the context. |
| Qwen/Qwen3-14B | Title: Did your openclaw integration with GPT get nerfed? Content: 'Intellectual level' is only 20 but in codex it's 512. Can't lobsters use full-intelligence GPT? |
严重
[准确性]
"Can't lobsters use full-intelligence GPT?"
理由: Source refers to '龙虾' (OpenClaw, a product name), not literal lobsters. Hypothesis mistranslates as 'lobsters' instead of the product name 'OpenClaw' as in reference. | "龙虾" here refers to the OpenClaw product/service, not literal lobsters; mistranslation of key term causes misunderstanding. | The source '龙虾' (Lobster) is a nickname for the project 'OpenClaw' (as seen in other items and context). Translating it literally as 'lobsters' (the animal) loses the specific referent to the software project. |
| Qwen/Qwen3.5-4B | Title: Did your openclaw integration with GPT nerf you? Content: "Intellectual level" is only 20, but Codex has 512. Can't lobsters use fully intelligent GPT? |
严重
[准确性]
"Did your openclaw integration with GPT nerf you?"
理由: The term 'nerf' is less precise than 'shadowbanned/downgraded' in the reference. | Misinterprets “接入 GPT 后降智了吗?”; here it asks if OpenClaw got shadowbanned/downgraded after connecting to GPT, not generally ‘nerfed’ by the integration. | 降智 means 'reduced intelligence/dumbed down', not 'nerfed'. 'Nerf' is gaming terminology for weakening, but the source specifically refers to intelligence degradation. | Mistranslation of '降智' (reduced intelligence/dumbed down). 'Nerf you' implies weakening the user, whereas the source asks if the tool's intelligence was reduced.
严重
[准确性]
"Can't lobsters use fully intelligent GPT?"
理由: 龙虾 is a proper noun referring to 'OpenClaw' (a specific AI system), not literal lobsters. Should be 'OpenClaw' not 'lobsters'. | “龙虾” in this context refers to OpenClaw, not literal lobsters; also misses nuance of ‘utilize the full-intelligence GPT’ tied to the 20 vs 512 levels. | Mistranslation of '龙虾' which is the nickname for the software 'OpenClaw' in this context, not the animal. | The phrasing is less precise than the reference's 'Can't OpenClaw utilize the full-intelligence GPT?' |
| Qwen/Qwen3.5-9B | Title: Did your OpenClaw integration with GPT result in nerfing? Content: The 'intelligence level' is only 20, but there's 512 in Codex—does Lobster not use full-intelligence GPT? |
轻微
[风格]
"result in nerfing"
理由: Slang/neologism '降智' (nerfing/dumbing down) translated as 'nerfing' which is acceptable but could be more explicit | Colloquial gaming slang is acceptable but slightly diverges from the source nuance of being shadowbanned/downgraded; still understandable. | 'Nerfing' is a good gaming slang equivalent for '降智' (intelligence reduction), but the phrasing 'result in nerfing' is slightly less natural than 'get nerfed'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Have your OpenCLAW integrated with GPT caused intelligence levels to be downgraded? |
严重
[准确性]
"Have your OpenCLAW integrated with GPT caused intelligence levels to be downgraded?"
理由: The hypothesis only translates the title and completely omits the content paragraph regarding specific intelligence levels (20 vs 512) and the question about utilizing full intelligence. | Missing content: '智力水平' translation 'intelligence level' is missing; 'codex' and '512' not translated; '龙虾' (crayfish/OpenClaw) not translated; question about full-intelligence GPT not conveyed | The hypothesis translates '降智' as 'downgraded' but the reference uses 'shadowbanned/downgraded', which better captures the nuance. More critically, the hypothesis misses the colloquial/slang nature of the original question and doesn't convey the sarcastic tone about capability reduction. | Mistranslates source rhetorical question about OpenClaw being shadowbanned/downgraded after connecting to GPT; misses specific meanings of "接入" (connecting) and "降智" (shadowban/downgrade), and loses mention of Codex and numeric intelligence levels.
轻微
[术语]
"OpenCLAW"
理由: Inconsistent capitalization compared to the reference 'OpenClaw' and standard branding, though 'openclaw' in source is lowercase. | Incorrect translation: 'OpenCLAW' should be 'OpenClaw' (proper name) | Incorrect capitalization of 'openclaw'. | The source uses '龙虾' (lobster/prawn) as a nickname for OpenClaw, but the hypothesis uses 'OpenCLAW' instead of 'OpenClaw', inconsistent with reference capitalization. |
| google/gemma-3-12b-it | Title: Has your OpenClaw become dumber after integrating with GPT? Content: "Intelligence level" is only 20, but it was 512 in Codex. Does the lobster really need a fully intelligent GPT? |
严重
[准确性]
"Has your OpenClaw become dumber after integrating with GPT?"
理由: Adds strong negative nuance 'become dumber' and omits idea of being shadowbanned/downgraded that appears in context; deviates from neutral query about connection to GPT. | The term 'dumber' may not accurately convey the intended meaning related to performance issues. | Mistranslation of the slang term 'openclaw' (referring to a specific project/persona) as the literal 'OpenClaw'. The reference correctly identifies the entity context or keeps the specific name, while 'OpenClaw' sounds like a generic tool name losing the specific community reference.
严重
[准确性]
"Does the lobster really need a fully intelligent GPT?"
理由: Translation is opposite to source meaning - source asks if lobster CANNOT use full-intelligence GPT, hypothesis asks if it NEEDS it | Reverses meaning: source asks whether OpenClaw can use full-intelligence GPT, not whether it needs it; also replaces product name with literal 'lobster' instead of keeping OpenClaw-brand sense. | The source '难道龙虾就用不上...' implies 'Can't the lobster use/access...' (expressing inability or unfairness). The hypothesis 'Does the lobster really need...' changes the meaning to questioning necessity, which contradicts the context of complaining about reduced intelligence limits. |
| google/gemma-3-1b-it | Translation of '你们的 openclaw 接入 GPT 后降智了吗?' in en |
轻微
[准确性]
"Translation of '你们的 openclaw 接入 GPT 后降智了吗?' in en"
理由: The translation does not capture the essence of the source text and deviates significantly from the reference. | Not an actual translation; it is a meta-description of the task and omits the meaning of the source/title and content. | The hypothesis is a meta-commentary describing the translation task rather than providing the actual translation of the source text. The content is completely missing. | Hypothesis is incomplete and does not provide actual translation. It only states 'Translation of...' without delivering the translated content. Reference shows proper translation: 'Did your OpenClaw get shadowbanned/downgraded after connecting to GPT?' |
| google/gemma-3-4b-it | Did your openclaw access to GPT result in a ‘shadowbanning’? |
轻微
[准确性]
"Did your openclaw access to GPT result in a ‘shadowbanning’?"
理由: Source discusses '降智' (reduced intelligence/capability), not shadowbanning. Reference correctly translates as 'shadowbanned/downgraded'. Hypothesis uses wrong concept entirely. | Completely different meaning; source is about intelligence level limits and downgrading, not shadowbanning. | The hypothesis translates only the title and omits the entire content paragraph. It also mistranslates '降智' (reduced intelligence) as 'shadowbanning', missing the specific context of intelligence levels mentioned in the source. | The term 'shadowbanning' does not accurately translate the original meaning of '降智' which refers to a decrease in intelligence or capability, not shadowbanning. |
| google/translategemma-12b-it | Title: Did your OpenClaw become less intelligent after integrating with GPT? Content: The "intelligence level" is only 20, but it was 512 in Codex. Does OpenClaw not use GPT with full intelligence? |
严重
[准确性]
"openclaw"
理由: Source uses '降智' (became less intelligent/dumbed down), but reference uses 'shadowbanned/downgraded' which better captures the nuance of the original complaint about degradation of service quality, not just intelligence level | The term 'shadowbanned/downgraded' in the reference is more specific than 'become less intelligent' in the hypothesis. | The proper noun refers to the specific product/service name and should be kept as "OpenClaw" to match source/reference capitalization and form. | Mistranslation of the slang term 'openclaw' (a specific project name) as the literal 'OpenClaw'. The reference implies it should be treated as a proper noun or kept closer to the source context, but 'OpenClaw' changes the meaning to a generic open claw. More critically, '降智' (intelligence reduction/lobotomization) is translated as 'become less intelligent', which is acceptable, but the term 'openclaw' is a specific community slang for a proxy tool often referred to as 'OpenClaw' in English contexts, however, the source uses lowercase. The bigger issue is '龙虾' (Lobster) in the source content is missing in the hypothesis translation ('Does OpenClaw not use...'), whereas the source asks 'Does the lobster...' referring to the user/tool personification. The hypothesis misses the personification 'Lobster'. |
| google/translategemma-4b-it | Did your openclaw access to GPT result in 'shadowbanning' or 'downgrading'? |
严重
[准确性]
"Did your openclaw access to GPT result in 'shadowbanning' or 'downgrading'?"
理由: Invents concepts of 'shadowbanning' and 'downgrading' not present in source; omits key numeric details about intelligence levels and Codex vs GPT usage. | The source uses '降智' (reduced intelligence/dumbed down), not shadowbanning. The hypothesis misinterprets the meaning entirely. | The hypothesis translates '降智' (reduced intelligence) as 'shadowbanning', which is a completely different concept. It also omits the entire content section regarding intelligence levels (20 vs 512). | Title translation is incomplete - missing the Chinese term '降智' which means 'intelligence degradation' or 'dumbing down'. The reference correctly translates this as 'shadowbanned/downgraded' but hypothesis only mentions shadowbanning/downgrading without capturing the 'intelligence' aspect |
| tencent/HY-MT1.5-1.8B | Translation of '登录' in en |
致命
[准确性]
"Translation of '登录' in en"
理由: 完全未翻译源文内容,且凭空出现“登录”,与原文和参考标题、内容无关。 | The hypothesis is a meta-instruction describing a translation task for the wrong source word ('登录' vs 'openclaw') and language direction (to English instead of Chinese), failing to provide the actual translation. | Hypothesis is completely unrelated to source - appears to be a placeholder or wrong translation entirely | Hypothesis does not match source or reference content. The source discusses OpenClaw and GPT intelligence levels, not login ('登录'). The hypothesis appears to be unrelated to the actual translation task. | The translation is unrelated to the source content. |
| tencent/HY-MT1.5-7B | Does your OpenCLaw become less intelligent after integrating with GPT? |
严重
[准确性]
"Does your OpenCLaw become less intelligent after integrating with GPT?"
理由: The translation does not capture the specific terms 'shadowbanned' or 'downgraded' from the reference. | The hypothesis only translates the title and completely omits the content paragraph regarding intelligence levels (20 vs 512) and the specific question about utilizing full intelligence. | Omitted key quantitative details about intelligence levels (20 vs 512) and the question about using full-intelligence GPT; also mistranslates OpenClaw casing and drops Codex reference. | Mistranslation: '降智' means 'shadowbanned/downgraded' not 'less intelligent'
轻微
[术语]
"OpenCLaw"
理由: Inconsistent casing: source uses 'openclaw' (lowercase), hypothesis uses 'OpenCLaw' (camelCase) | Inconsistent capitalization compared to the reference 'OpenClaw' and common usage in the context. | Hypothesis uses 'OpenCLaw' but source and reference consistently use '龙虾' (lobster/crayfish) as the product name, not 'OpenClaw' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Dragon Crab: Meow? ! Master ... There is some bad news |
严重
[准确性]
"Dragon Crab"
理由: Incorrect translation: '龙虾' (crayfish/lobster) was mistranslated as 'Dragon Crab' instead of 'OpenClaw' | Source says '龙虾' (OpenClaw/lobster), not 'Dragon Crab'. Incorrect entity translation | Mistranslation of '龙虾' (lobster). 'Dragon Crab' is incorrect; the context implies the project name 'OpenClaw' or simply 'Lobster'. | Mistranslates “龙虾”, which in this context refers to OpenClaw/‘lobster’, not ‘dragon crab’.
轻微
[风格]
"There is some bad news"
理由: Less direct than the reference | Lacks the personal agency present in the source and reference ('I have bad news'). The source implies the character is delivering the news directly. | Tone is more formal and weaker than the emphatic slangy “有个坏消息”; “I have bad news” would better match the style. |
| CohereLabs/tiny-aya-water | Title: Dragon Crab: Meow? ! Master ... There is some bad news |
严重
[术语]
"Dragon Crab"
理由: Source '龙虾' (OpenClaw) is mistranslated as 'Dragon Crab'. Reference correctly identifies it as 'OpenClaw'. | Incorrect translation of '龙虾' as 'Dragon Crab' instead of 'OpenClaw'. | Incorrect entity name: 'Dragon Crab' should be 'OpenClaw' (the brand/product name) | Mistranslation of '龙虾'. In this context (referring to OpenClaw), it should be translated as 'Lobster' or kept as the proper noun 'OpenClaw' as seen in the reference. 'Dragon Crab' is factually incorrect. |
| Qwen/Qwen2.5-14B-Instruct | Title: Lobster: Meow meow?! Master... there's some bad news Content: Lobster: Meow meow?! Master... there's some bad news (°△°|||) Moltbook.ai has crashed! |
严重
[准确性]
"Lobster"
理由: Incorrect entity - 'Lobster' should be 'OpenClaw' as per source and reference | Source '龙虾' is a product name (OpenClaw), not a literal lobster. Should be translated as 'OpenClaw' per reference. | Mistranslation of context. '龙虾' refers to the software 'OpenClaw', not the animal. | Source subject is the product/name 小龙虾/OpenClaw, not a literal lobster. |
| Qwen/Qwen2.5-7B-Instruct | Title: Lobster: Meow?!, Master... Bad News Content: Lobster: Meow?!, Master... Bad News (°△°|||) Moltbook.ai has crashed! |
严重
[准确性]
"Lobster: Meow?!, Master... Bad News"
理由: '龙虾' is a product name (OpenClaw), not literal 'lobster'; reference correctly uses 'OpenClaw' | Mistranslation of slang: '龙虾' is the nickname for 'OpenClaw'. It should be translated as 'OpenClaw' to maintain context, not literally as 'Lobster'. | Source implies "有个坏消息" as "I have bad news"; omission of pronoun slightly reduces completeness but overall meaning is preserved. |
| Qwen/Qwen3-14B | Granchio: Miao miao miao?! Padrone... c'è una brutta notizia |
严重
[准确性]
"Granchio"
理由: The source '龙虾' refers to the bot 'OpenClaw'. The hypothesis translates it as 'Granchio' (Crab) instead of the established name or 'Lobster', and the target language is Italian while the requested target was Chinese (zh), although the hypothesis provided is in Italian. Assuming evaluation of the provided hypothesis against source meaning: 'Granchio' is the wrong entity (Crab vs Lobster/OpenClaw). | Source subject is "龙虾" as product/role name, better rendered as "OpenClaw" rather than generic "granchio" (crab). | Hypothesis is in Italian, but target language is zh (Chinese). Should be translated to Chinese, not Italian. Also 'Granchio' (crab) and 'Padrone' (master) are incorrect - should be 'OpenClaw' and proper Chinese terms. |
| Qwen/Qwen3.5-4B | Title: Lobster: Meow? Master... Bad news! Content: Lobster: Meow? Master... Bad news! (°△°|||) Moltbook.ai is down! |
严重
[准确性]
"Lobster: Meow? Master... Bad news!"
理由: 龙虾 is a proper noun referring to 'OpenClaw', not a literal lobster. Should be 'OpenClaw' not 'Lobster'. | Mistranslation of '龙虾' (OpenClaw bot nickname) as the animal 'Lobster' instead of the entity name 'OpenClaw'. | “喵喵喵” is repeated and more emphatic (“Meow meow meow?!”); “Lobster” is literal but the reference uses the product name OpenClaw, which is more accurate in this context. |
| Qwen/Qwen3.5-9B | Title: Lobster: Meow meow?! Master... there's bad news Content: Lobster: Meow meow?! Master... there's bad news (°△°|||) Moltbook.ai is down! |
严重
[术语]
"Lobster"
理由: Should use the established product name OpenClaw rather than literal animal translation to match source/reference. | The term 'Lobster' is used instead of the correct term 'OpenClaw'. | 龙虾 refers to 'OpenClaw', not 'Lobster'. Reference correctly uses 'OpenClaw'. | Incorrect translation: 'Lobster' should be 'OpenClaw' (the entity name). The source uses '龙虾' as a nickname for OpenClaw, but in this context it's the character name that should remain as 'OpenClaw' per reference.
轻微
[流畅性]
"Meow meow?!"
理由: The source has three '喵' characters ('喵喵喵'), but the hypothesis only translates two ('Meow meow'). The reference captures all three. | Hypothesis has 'Meow meow?!' while source and reference have 'Meow meow meow?!' - missing one 'meow' | Source has 喵喵喵 (three meows), hypothesis only has two. Reference correctly translates as 'Meow meow meow?!' |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Dragon prawn: Meow meow meow? Master... there's bad news from Moltbook.ai! |
严重
[术语]
"Dragon prawn"
理由: The hypothesis translates '龙虾' as 'Dragon prawn' but the reference correctly identifies this as 'OpenClaw', the actual product name being referenced. | Incorrect entity; source and reference use OpenClaw (a product/name), not "Dragon prawn". | Incorrect translation: 'Dragon prawn' should be 'OpenClaw' (the character name)
严重
[准确性]
"there's bad news from Moltbook.ai!"
理由: The source says 'Moltbook.ai is down/hung' (挂了), not that the news is *from* Moltbook. The hypothesis misses the specific status of the service being down. | The hypothesis incorrectly adds 'from Moltbook.ai' to the sentence structure, changing the meaning. The source indicates Moltbook.ai is down (separate statement), not the source of bad news. | Changes meaning: source states Moltbook.ai "已经挂了" (is down); hypothesis only says there is bad news from Moltbook.ai, omitting the explicit outage. | Wrong meaning: 'from' should be 'that' - Moltbook.ai is down, not news coming from it |
| google/gemma-3-12b-it | Title: Lobster: Meow meow meow?! Master... I have bad news Content: Lobster: Meow meow meow?! Master... I have bad news (°△°|||) Moltbook.ai is down! |
轻微
[术语]
"Lobster"
理由: Literal translation loses established product/name term 'OpenClaw' used in context and reference. | Should be 'OpenClaw' not 'Lobster' - this is the character name in the source context | The term 'Lobster' is used instead of 'OpenClaw', which might be a nickname for the project. |
| google/gemma-3-1b-it | Dragon Shrimp: Is this a bad message?! Owner... there is a bad news! |
严重
[准确性]
"Is this a bad message?!"
理由: Mistranslation of '喵喵喵' (Meow meow meow). The source indicates the character is making cat sounds, not asking about a message. | Mistranslation of '龙虾' as 'Dragon Shrimp' instead of 'OpenClaw'. Also '喵喵喵?!' should be 'Meow meow meow?!' not 'Is this a bad message?!'. Missing the second sentence about Moltbook.ai being down. | The translation misinterprets the source text, altering the intended meaning. | Source expresses confusion/meowing; reference has "Meow meow meow?!"; meaning is distorted by turning it into a question about a message. |
| google/gemma-3-4b-it | Lobster: Meow meow?! Master... I have bad news. |
严重
[术语]
"Lobster"
理由: Source uses '龙虾' which in this context refers to 'OpenClaw' (a product name), not literal 'Lobster'. Reference correctly identifies this as 'OpenClaw'. | The term 'Lobster' is used instead of the more accurate 'OpenClaw' as per the reference. | Source/term refers to the product/handle '龙虾/OpenClaw', not literally the animal 'Lobster'. |
| google/translategemma-12b-it | "en": "Title: Lobster: Meow meow meow?! Master... I have bad news\nContent: Lobster: Meow meow meow?! Master... I have bad news (°△°|||)\ nMoltbook.ai is down!", "zh-TW": "標題:龍蝦:喵喵喵?!主人...有個壞消息\n內容:龍蝦:喵喵喵?!主人...有個壞消息 (°△°|||)\ nMoltbook.ai 已停止服務!", "ja": "タイトル:ロブスター:ニャーニャーニャ?ご主人様…悪い知らせがあります。\n内容:ロブスター:ニャーニャーニャ?ご主人様…悪い知らせがあります。 (°△°|||)\ nMoltbook.ai がダウンしました!", "ko": "제목: 랍스터: 야옹야옹? 주인님... 안 좋은 소식이 있습니다.\n내용: 랍스터: 야옹야옹? 주인님... 안 좋은 소식이 있습니다. (°△°|||)\ nMoltbook.ai가 다운되었습니다!", "de": "Titel: Hummer: Miau miau miau?! Herrchen... ich habe schlechte Nachrichten.\nInhalt: Hummer: Miau miau miau?! Herrchen... ich habe schlechte Nachrichten. (°△°|||)\ nMoltbook.ai ist ausgefallen!", "fr": "Titre : Homard : Miaou miaou miaou ?! Maître... j'ai une mauvaise nouvelle.\nContenu : Homard : Miaou miaou miaou ?! Maître... j'ai une mauvaise nouvelle. (°△°|||)\ nMoltbook.ai est hors service !", "it": "Titolo: Aragosta: Miao miao miao?! Padrone... ho una brutta notizia.\nContenuto: Aragosta: Miao miao miao?! Padrone... ho una brutta notizia. (°△°|||)\ nMoltbook.ai è offline!" } <|file_separator|> | 通过 (无共识错误) |
| google/translategemma-4b-it | Title: Lobster: Meow Meow?! Master... There's bad news |
严重
[术语]
"Lobster"
理由: The source refers to '龙虾' (OpenClaw), not a generic 'Lobster'. The proper entity name should be used. | Incorrect translation of 'OpenClaw' as 'Lobster'. | Incorrect entity translation - 'Lobster' should be 'OpenClaw' (小龙虾 refers to OpenClaw, a project name, not a literal lobster) |
| tencent/HY-MT1.5-1.8B | Translation of '龙虾:喵喵喵?!主人...有个坏消息' in en |
致命
[准确性]
"Translation of '龙虾:喵喵喵?!主人...有个坏消息' in en"
理由: Hypothesis is completely unrelated to source - appears to be a placeholder or wrong translation entirely | 仅是一个占位描述,未提供实际英文译文,无法对应源句和参考译文。 | The hypothesis is a meta-instruction describing a translation task rather than providing the translated text. It also targets the wrong language (English) and only covers the title, ignoring the content. |
| tencent/HY-MT1.5-7B | Title: Lobster: Meow Meow Meow?! Master... there's some bad news. |
严重
[术语]
"Lobster"
理由: While '龙虾' literally means lobster, the context (OpenClaw project) suggests 'OpenClaw' or keeping the mascot name consistent with Item 1 and the reference is preferred over the generic animal name. | Hypothesis translates '龙虾' as 'Lobster' but reference uses 'OpenClaw' which is the actual product name | Source/segment context refers to the branded name OpenClaw rather than generic 'Lobster'; brand term is not preserved. | Incorrect entity: source uses 'OpenClaw' (the AI tool), not 'Lobster' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | What is the simplest method to deploy now? Is there any one-click deployment? Preferable open source on GitHub with more stars. |
严重
[准确性]
"What is the simplest method to deploy now?"
理由: Title not translated - source is Chinese, hypothesis is English but not a proper translation of the source title | Omission of '小龙虾' (OpenClaw/Little Lobster). The translation asks about deploying anything, losing the specific subject matter. | Omission of the subject ‘小龙虾’ (OpenClaw); should specify what is being deployed, e.g. “deploy OpenClaw”. | Source title is '现在部署小龙虾最简单的方法是啥?' which should translate to 'What is the easiest way to deploy OpenClaw right now?' Hypothesis misses the subject 'OpenClaw' |
| CohereLabs/tiny-aya-water | The translation for 'Title: 现在部署小龙虾最简单的方法是啥?' is 'What is the simplest way to deploy Lingshkou now?' |
致命
[准确性]
"Lingshkou"
理由: Mistranslation of “小龙虾”, which refers to OpenClaw/lobster; “Lingshkou” is incorrect and unclear. | Nonsensical transliteration of '小龙虾'. Should be 'OpenClaw' as shown in reference. This is a proper noun referring to a specific project. | Severe hallucination/mistranslation. '小龙虾' (crayfish) is a slang term for 'OpenClaw' in this context. 'Lingshkou' is nonsensical and bears no relation to the source term. |
| Qwen/Qwen2.5-14B-Instruct | Title: What is the simplest way to deploy小龙虾now? Content: Is there one-click deployment available? Preferably open-source on GitHub with more stars? |
严重
[流畅性]
"deploy小龙虾"
理由: Use of 'simplest' instead of 'easiest' | Mixed language (English and Chinese) in title. Should be fully translated. '小龙虾' is a product name (OpenClaw) that should be translated. | Code-mixing without spacing is unnatural in English; should be "deploy Xiaolongxia" or the established product name.
严重
[准确性]
"小龙虾"
理由: Mixed language - '小龙虾' should be translated to 'OpenClaw' as in reference | Failure to translate. The term '小龙虾' (Crayfish) is used here as a nickname for 'OpenClaw' and should be translated or transliterated consistent with the project name, not left in Chinese characters within an English sentence. | Product name should be translated as 'OpenClaw' per reference, not left in Chinese. |
| Qwen/Qwen2.5-7B-Instruct | What's the simplest way to deploy小龙虾 now? Is there one-click deployment available? Preferably from GitHub with many stars. | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Qual è il modo più semplice per distribuire il granchio adesso? |
致命
[准确性]
"distribuire il granchio"
理由: Hypothesis is in Italian, but target language is zh (Chinese). Should be translated to Chinese. Also 'granchio' (crab) is incorrect - should be 'OpenClaw'. | "小龙虾" here is the OpenClaw project, not a literal crab/lobster; misrenders the key entity. | The hypothesis is in Italian, not the target language Chinese (zh). It also mistranslates '小龙虾' (OpenClaw/Crayfish) as 'granchio' (Crab). Furthermore, it completely omits the content section regarding 'one-click deployment' and 'GitHub open-source'. |
| Qwen/Qwen3.5-4B | What is the easiest way to deploy crabs now? |
严重
[准确性]
"What is the easiest way to deploy crabs now?"
理由: 小龙虾 refers to 'OpenClaw' (a specific system), not literal crabs. Additionally, the hypothesis is missing the content entirely. | “小龙虾” here is the product OpenClaw, not “crabs”; also omits the question about one-click deployment and GitHub open-source with many stars, losing key content. | Severe mistranslation of '小龙虾' (OpenClaw) as 'crabs' (the animal), losing the entire context of the software tool. |
| Qwen/Qwen3.5-9B | Title: What is the easiest way to deploy Xiaolongxia now? Content: Is there one-click deployment? Preferably open source on GitHub with many stars. |
严重
[术语]
"Xiaolongxia"
理由: Incorrect translation: '小龙虾' (Xiaolongxia/crayfish) is a nickname for OpenClaw and should be translated as 'OpenClaw' per reference. Using pinyin 'Xiaolongxia' is not appropriate here. | Should use OpenClaw as in source/reference instead of transliterated 'Xiaolongxia' for the product name. | 小龙虾 is a codename for 'OpenClaw', not a literal translation 'Xiaolongxia'. Reference correctly identifies as 'OpenClaw'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | What is the simplest way to deploy small shrimp now? |
严重
[术语]
"small shrimp"
理由: Incorrect translation: 'small shrimp' should be 'OpenClaw' (proper name for 小龙虾 in this context) | Mistranslates 小龙虾, which in this context is the product/name OpenClaw, not literal "small shrimp". | The hypothesis translates '小龙虾' as 'small shrimp' but the reference correctly identifies this as 'OpenClaw', the product name. This is a critical terminology error.
严重
[准确性]
"What is the simplest way to deploy small shrimp now?"
理由: Mistranslation of '小龙虾'. In this technical context, it refers to the 'OpenClaw' project. Translating it literally as 'small shrimp' loses the specific entity reference. | The hypothesis fails to recognize '小龙虾' as a nickname/reference to OpenClaw and provides a literal but incorrect translation that obscures the actual meaning. | Omits the second sentence about one-click deployment and GitHub open-source with many stars, thus incomplete translation of the content. | Incorrect translation of '小龙虾' which should be 'OpenClaw'. |
| google/gemma-3-12b-it | Title: What's the easiest way to deploy crawfish now? Content: Is there a one-click deployment option? Ideally, an open-source GitHub project with many stars. | 通过 (无共识错误) |
| google/gemma-3-1b-it | What is the simplest way to deploy small shrimp now? |
严重
[术语]
"small shrimp"
理由: Mistranslation of '小龙虾'. In this technical context, it is the nickname for the 'OpenClaw' project, not the literal animal 'small shrimp' or 'crayfish'. | The term 'small shrimp' is used instead of the proper noun 'OpenClaw', leading to confusion. | Misinterprets "小龙虾" which in this context refers to the project/name "OpenClaw", not literal shrimp.
严重
[准确性]
"What is the simplest way to deploy small shrimp now?"
理由: The hypothesis omits the entire content section of the source text regarding one-click deployment and GitHub stars. | Only translates the title and omits the content asking about one-click deployment and GitHub open-source with many stars. | '小龙虾' (OpenClaw) is mistranslated as 'small shrimp'. Missing the content translation about one-click deployment and GitHub requirements. |
| google/gemma-3-4b-it | What’s the easiest way to deploy crayfish now? |
严重
[术语]
"crayfish"
理由: Term refers to the product '小龙虾/OpenClaw', not the literal animal 'crayfish'. | The term 'crayfish' is used instead of the more accurate 'OpenClaw' as per the reference. | Source '小龙虾' refers to 'OpenClaw' (product name), not literal crayfish. Reference correctly translates as 'OpenClaw'. |
| google/translategemma-12b-it | Title: What is the easiest way to deploy Lobster right now? Content: Is there a one-click deployment option? Ideally, it should be open source on GitHub and have a large number of stars. |
严重
[准确性]
"Lobster"
理由: Should preserve the specific name "OpenClaw" instead of generic "Lobster" to match product/brand reference. | Mistranslation of '小龙虾' (Crayfish/Little Lobster). In this specific community context (referring to the 'OpenClaw' project), '小龙虾' is a slang term for the tool. The reference uses 'OpenClaw'. Translating it literally as 'Lobster' loses the specific referent to the software project. | Source refers to '小龙虾' which is a product name (OpenClaw), not a literal lobster. Should be 'OpenClaw' not 'Lobster' to maintain consistency with the product being discussed. |
| google/translategemma-4b-it | What is the easiest way to deploy crawfish now? |
严重
[术语]
"crawfish"
理由: Uses generic 'crawfish' instead of the specific project name 小龙虾/OpenClaw used in context. | The source refers to '小龙虾' (OpenClaw), not crawfish. This is a mistranslation of a proper entity name. | Incorrect entity translation - 'crawfish' should be 'OpenClaw' (小龙虾 refers to OpenClaw, a project name, not a literal crawfish) | Incorrect translation of 'OpenClaw' as 'crawfish'. |
| tencent/HY-MT1.5-1.8B | Il modo più semplice per distribuire gli omaccetti ora è... |
致命
[准确性]
"Il modo più semplice per distribuire gli omaccetti ora è..."
理由: The translation is in Italian instead of Chinese. | Hypothesis is in Italian instead of target language Chinese (zh), and content is unrelated to source | The hypothesis is in Italian, but the target language is Chinese (zh). Additionally, 'omaccetti' is an incorrect translation for '小龙虾' (which refers to 'OpenClaw' in this context, literally 'little crayfish'). | 译文为意大利语且语义与“部署小龙虾/OpenClaw”的内容不符,“omaccetti”与原文完全无关。 | Hypothesis is in Italian, not English. Target language is zh (Chinese), but hypothesis should be English translation. Content is completely incorrect and unrelated to the source about deploying OpenClaw. |
| tencent/HY-MT1.5-7B | "en": "What is the simplest way to deploy crayfish now?", "zh-TW": "現在部署小龙蝦最簡單的方法是什麼?」, "ja": "現在、クライフィッシュを配備する最も簡単な方法は何ですか?", "ko": "지금 크라이피시를 배포하는 가장 간단한 방법은 무엇인가요?", "de": "Was ist der einfachste Weg, um momentan Krabben zu verteilen?", "fr": "Quel est le moyen le plus simple de déployer des crevettes actuellement ?", "it": "Qual è il modo più semplice per distribuire le gamberetti al momento?" }|im_end|> |
严重
[术语]
""en": "What is the simplest way to deploy crayfish now?""
理由: The term 'crayfish' is used instead of 'OpenClaw', which is a critical error as it changes the subject. | '小龙虾' here is established as the product name OpenClaw, not literal crayfish; brand term not used. | Incorrect entity: '小龙虾' in context refers to 'OpenClaw' (the AI tool), not actual crayfish | Hypothesis translates '小龙虾' as 'crayfish' but reference correctly identifies this as 'OpenClaw', the product name | Mistranslation of the project nickname '小龙虾' (OpenClaw). 'Crayfish' is the literal animal, whereas the context refers to the software project. |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | The four accounts were all opened with bypassing the verification process... |
致命
[准确性]
"were all opened with bypassing the verification process"
理由: Major mistranslation: '反代' (reverse proxy) was translated as 'bypassing the verification process' which is incorrect; '反重力' (Antigravity) was omitted entirely; context about distinguishing Student Pro vs reverse proxy is lost | Misinterpretation of '反代' as 'bypassing the verification process' instead of 'reverse proxies' | Hypothesis completely misrepresents source content. Source discusses reverse proxies and Antigravity, not verification bypass. Major semantic error | Severe mistranslation of '反代' (reverse proxy). The hypothesis invents a meaning about verification bypassing which is not in the source. | Mistranslates “都开了反代” (all had reverse proxies enabled) as bypassing verification, changing the technical meaning. |
| CohereLabs/tiny-aya-water | These four accounts have all been opened with bypassing... |
严重
[准确性]
"have all been opened with bypassing"
理由: Incomplete translation. The hypothesis cuts off mid-sentence and fails to translate the Title and the rest of the Content. 'Opened with bypassing' is also an awkward rendering of '开了反代' (enabled reverse proxy). | Incomplete and inaccurate translation. Source says '这四个号都开了反代' meaning 'All four accounts had reverse proxies enabled'. Hypothesis is fragmentary and unclear. | “开了反代” means enabled reverse proxy, not a vague ‘opened with bypassing’; key technical meaning lost. | Incomplete translation: '反代' (reverse proxy) is translated as just 'bypassing' which loses the specific technical meaning; 'Title' is not translated
严重
[术语]
"bypassing"
理由: Incorrect translation of '开了反代' as 'opened with bypassing' instead of 'reverse proxies enabled'. | Incorrect term: 'bypassing' should be 'reverse proxy' to match the source terminology | Incorrect term. Should be 'reverse proxies' as in reference, not 'bypassing'. |
| Qwen/Qwen2.5-14B-Instruct | Anche senza ban degli account, l'antigravitazione sembra ancora distinguere tra le restrizioni per gli studenti Pro e i proxy |
致命
[准确性]
"Anche senza ban degli account, l'antigravitazione sembra ancora distinguere tra le restrizioni per gli studenti Pro e i proxy"
理由: Target language required is Chinese but the hypothesis is in Italian; content is not in the requested target language. | Wrong target language - output is in Italian instead of Chinese (zh) | Wrong target language. The hypothesis is in Italian, but the requested target language is Chinese (zh). Additionally, the content part of the source text is completely missing in the hypothesis. | Hypothesis is in Italian, but target language is zh (Chinese). This is a complete language mismatch. Should be in English or Chinese. |
| Qwen/Qwen2.5-7B-Instruct | Even without bans, anti-gravity seems to still differentiate between student pros and proxies... |
严重
[准确性]
"Even without bans, anti-gravity seems to still differentiate between student pros and proxies..."
理由: The phrase 'student pros and proxies' is less precise compared to 'Student Pro vs Reverse Proxy' in the reference. | Incomplete translation; hypothesis cuts off mid-sentence and misses significant content present in source and reference | Incomplete translation: The hypothesis only translates the title and omits the entire content section provided in the source. | Incomplete translation - missing Content portion entirely |
| Qwen/Qwen3-14B | Anche senza essere bloccati, Anti-Gravity sembra comunque distinguere e limitare tra Student Pro e Anti-Cheat |
严重
[准确性]
"Anti-Cheat"
理由: "反代" means reverse proxy, not anti-cheat; mistranslates a technical term. | Hypothesis is in Italian, but target language is zh (Chinese). Should be translated to Chinese, not Italian. | The source term '反代' is an abbreviation for '反向代理' (Reverse Proxy). The hypothesis translates it as 'Anti-Cheat', which is semantically incorrect in this technical context. The reference correctly uses 'Reverse Proxy'. |
| Qwen/Qwen3.5-4B | Even without being banned, Anti-Gravity seems to still distinguish between student pro and anti-sub. |
轻微
[术语]
"Anti-Gravity"
理由: Incorrect translation of '反代'. In this technical context, it stands for 'reverse proxy' (as in the reference), not 'anti-subscription' or 'anti-sub'. | Inconsistent casing compared to established name “Antigravity”; should be treated as a proper product name. | The term 'anti-sub' is less precise than 'reverse proxy' in the reference.
严重
[准确性]
"anti-sub"
理由: “反代” means reverse proxy, not ‘anti-sub’; this mistranslates a key technical term. | 反代 means 'reverse proxy', not 'anti-sub'. The translation should be 'Reverse Proxy' not 'anti-sub'. | The source implies distinguishing and limiting (区别限制), the hypothesis only captures 'distinguish'. |
| Qwen/Qwen3.5-9B | Title: Even without being banned, anti-gravity seems to still distinguish between student pro and anti-proxy Content: All four accounts have enabled anti-proxy... simultaneously using anti-gravity normally... today on anti-proxy... |
严重
[术语]
"anti-proxy"
理由: 反重力 refers to 'Antigravity' (a service/tool name), not a literal translation. Reference correctly uses 'Antigravity'. | Inconsistent terminology: '反重力' should be 'Antigravity' (as proper noun) not 'anti-gravity'. Hypothesis uses lowercase 'anti-gravity' while reference uses capitalized 'Antigravity' as a service name. | Should use the standard term 'reverse proxy' for '反代' as in the reference. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Even without a ban, gravity seems to still distinguish between student pro and anti-gravity. |
严重
[术语]
"gravity"
理由: Incorrect capitalization of '反重力' which should be 'Antigravity'. | Mistranslation of '反重力'. In this context, it is a proper noun for a specific service/tool ('Antigravity'), not the physical concept of gravity. | Improper term; the product/service name is Antigravity, not literal "gravity". | Fails to recognize '反重力' as a proper noun (Antigravity service) and instead translates it literally as 'gravity', fundamentally changing the meaning.
轻微
[准确性]
"distinguish between student pro and anti-gravity"
理由: The hypothesis severely misinterprets the source. '反重力' (Antigravity) is a product/service name, not a literal concept. The translation conflates 'Antigravity' with 'anti-gravity' and misrepresents the technical meaning entirely. | Mistranslation of '反代'. '反代' is short for '反向代理' (Reverse Proxy). The hypothesis translates it as 'anti-gravity', confusing it with the previous term, whereas the reference correctly identifies it as 'Reverse Proxy'. | Mistranslates "限制学生pro和反代"; should be about limiting Student Pro and reverse proxy, not "anti-gravity" as a contrasted type. | Completely mistranslated: '反重力' is Antigravity (service name), not 'gravity'; '反代' is reverse proxy, not 'anti-gravity'; entire meaning is wrong |
| google/gemma-3-12b-it | Title: Even without bans, anti-gravity seems to still differentiate between student pro and reverse proxies Content: All four accounts have reverse proxies enabled...with normal use of anti-gravity...today on a reverse proxy... | 通过 (无共识错误) |
| google/gemma-3-1b-it | Even without a ban, anti-detection seems to still be able to distinguish between student pro and anti-do |
严重
[术语]
"anti-detection"
理由: The term 'anti-do' is incorrect and does not match the reference's 'reverse proxy'. | Incorrect term; should be the product/service name "Antigravity" rather than a generic "anti-detection". | Incorrect translation of '反代'. In this context, it stands for 'Reverse Proxy', not 'anti-do'.
严重
[准确性]
"anti-do"
理由: Incomplete and inaccurate translation. 'anti-do' is truncated/nonsensical. Missing proper translation of '反重力' (Antigravity) and '反代' (reverse proxy). Content is not translated at all. | Mistranslation of "反代" which means "reverse proxy"; output is nonsensical and wrong. | The hypothesis omits the content section of the source text entirely. |
| google/gemma-3-4b-it | Even without being banned, anti-gravity seems to still differentiate between restricted student pro and cheating. |
严重
[准确性]
"cheating"
理由: The hypothesis omits the content entirely. It also mistranslates '反代' (reverse proxy) as 'cheating' and fails to properly contextualize 'Student Pro' vs 'Reverse Proxy'. | Mistranslation of '反代' which means 'reverse proxy', not 'cheating'. | Hypothesis misinterprets the meaning. Source discusses technical distinction between 'Student Pro' and 'Reverse Proxy' usage patterns. Hypothesis incorrectly frames it as 'cheating' and misses the technical context. |
| google/translategemma-12b-it | Title: Even without being banned, anti-gravity seems to still differentiate between student pro and reverse proxy accounts. |
严重
[准确性]
"reverse proxy accounts"
理由: Incomplete translation. The hypothesis only translates the Title and omits the entire Content section ('这四个号都开了反代...'). | Source says "限制学生pro和反代" (distinguish/limit Student Pro and reverse proxies). "Accounts" is not explicit in source and slightly changes nuance from service types to account types. | Hypothesis only provides title translation but source content is incomplete/truncated. Reference provides both title and content translation. Hypothesis is missing the content translation entirely.
轻微
[术语]
"anti-gravity"
理由: While 'anti-gravity' is the literal translation of '反重力', in this context it refers to a specific service/tool name (Antigravity). Capitalization or treating it as a proper noun would be more accurate based on the reference. | Should be 'Antigravity' (proper noun/product name) not 'anti-gravity' (generic term) | The term should be capitalized as 'Antigravity' to match the reference. |
| google/translategemma-4b-it | Even without being banned, anti-gravity seems to still differentiate between pro and cheating students. |
严重
[准确性]
"cheating students"
理由: Misrepresentation of 'Student Pro vs Reverse Proxy'. | Misinterprets '反代' (reverse proxy) as 'cheating students', changing the meaning. | Translation error: 'pro and cheating students' should be 'Student Pro vs Reverse Proxy' - the source refers to account types (Student Pro subscription) and reverse proxy usage, not students who cheat | The source '学生 pro' refers to a specific subscription tier ('Student Pro'), not students who are cheating. This is a severe mistranslation of the entity. | The source discusses '反重力' (Antigravity) distinguishing between '限制学生pro和反代' (limited Student Pro and reverse proxies), not 'cheating students'. The hypothesis misinterprets the technical context. |
| tencent/HY-MT1.5-1.8B | "en": "Translation of '即使没有封号,反重力似乎仍能区别限制学生pro和反代' in en", "zh-TW": "即使沒有封號,反重力似乎仍能區分限制學生pro和反代", "ja": "無論是否有封鎖,反重力似乎仍然可以區分學生的限制pro和反代", "ko": "비록 봉인이 없더라도 반중력은 학생들의 제한 pro와 반대를 구분할 수 있는 것 같다", "de": "Obwohl es keine Sperre gibt, scheint die Anti-Gravitation in der Lage zu sein, die Einschränkungen des Studenten pro und das Gegenteil zu unterscheiden", "fr": "Même sans blocage, la gravité anti-apparente semble capable de distinguer les restrictions du studente pro et le contraire", "it": "Anche senza blocco, la gravità anti-apparente sembra essere in grado di distinguere i limiti del studente pro e il contrario." } | 通过 (无共识错误) |
| tencent/HY-MT1.5-7B | "en": "Title: Even without being banned, it seems that Anti-Gravity can still distinguish between restricting students Pro and anti-substitution.", "zh-TW": "Title: Even without being banned, it seems that Anti-Gravity can still distinguish between restricting students Pro and anti-substitution.", "ja": "Title: 禁止されていなくても、Anti-Gravityは依然として学生Proの制限と反代替を区別できるようです。」, "ko": "Title: 비록 금지되어 있지 않아도, Anti-Gravity는 여전히 학생 Pro의 제한과 반대체를 구별할 수 있는 것으로 보입니다.", "de":="Titel: Auch ohne Sperre scheint es, als könnte Anti-Gravity zwischen der Einschränkung von Studenten Pro und der Anti-Ersatzmethode unterscheiden.", "fr":="Titre: Même sans être interdit, il semble que Anti-Gravity puisse encore distinguer entre la restriction des étudiants Pro et la méthode anti-substitution.", "it":="Titolo: Anche senza essere vietato, sembra che Anti-Gravity possa ancora distinguere tra la restrizione degli studenti Pro e il metodo anti-sostituzione."} | 通过 (无共识错误) |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: FlyBook's entry into the small scorpion interface |
严重
[准确性]
"small scorpion"
理由: Mistranslates “小龙虾” (OpenClaw/lobster) as “small scorpion”, changing the product/name. | Hallucination/Mistranslation. '小龙虾' in this context refers to the project 'OpenClaw'. Translating it literally as 'small scorpion' (which is also biologically incorrect as it's a lobster/crayfish) makes the title nonsensical. | Source refers to '飞书' (Feishu) and '小龙虾' (OpenClaw), not 'FlyBook' and 'small scorpion'. Incorrect terminology and entity names |
| CohereLabs/tiny-aya-water | Title: FlyBook's entry into the small shrimp interface |
严重
[术语]
"FlyBook"
理由: Incorrect translation of '飞书'. The standard English name for this product is 'Feishu' (or Lark internationally), not a literal translation 'FlyBook'. | Nonstandard rendering of “飞书”; using established name “Feishu” (or Lark) would be more appropriate. | Incorrect entity names: 'FlyBook' should be 'Feishu'; 'small shrimp' should be 'OpenClaw' (the brand/product name); 'interface' should be 'API' | Multiple errors: '飞书' is 'Feishu' not 'FlyBook'; '小龙虾' is 'OpenClaw' not 'small shrimp'; '接口' means 'API' not 'interface' in this context. |
| Qwen/Qwen2.5-14B-Instruct | Title: Lark Joins the Xiaolongxia API Content: Lark changed 10,000 calls to 1 million calls, which can be considered as catching this wave of open-source traffic |
严重
[术语]
"Xiaolongxia"
理由: Incorrect use of 'Xiaolongxia' instead of 'OpenClaw' | Should use the established product name OpenClaw; Xiaolongxia is a literal transliteration and loses branding consistency. | Inconsistent terminology. '小龙虾' is established as 'OpenClaw' in previous items. Using the pinyin 'Xiaolongxia' breaks consistency and obscures the specific software reference. |
| Qwen/Qwen2.5-7B-Instruct | Title: Feishu接纳LinuxDo虾米接口 Content: Feishu will change 10,000 calls to 1 million calls, also catching this wave of open-source traffic |
严重
[准确性]
"Feishu接纳LinuxDo虾米接口"
理由: Incorrect translation; should be 'Feishu enters the OpenClaw API game' per reference | Introduces "LinuxDo虾米接口" not present in source and changes "小龙虾接口" (OpenClaw API) to something else; incorrect entity and added content. | Incorrect translation - '飞书' (Feishu) is correct but '虾米' is wrong; should be 'OpenClaw' (小龙虾 is nickname) | Hallucination and mistranslation: The title completely changes meaning. '小龙虾' (OpenClaw) is mistranslated as 'LinuxDo 虾米' (LinuxDo Shrimp), and '入局' (enters the game/field) is changed to '接纳' (accepts). |
| Qwen/Qwen3-14B | Feishu entra nell'interfaccia Xiao Long虾 | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Feishu enters the crayfish API market; Feishu increases call volume from 1 million to 10 million calls, effectively capturing this wave of open-source traffic. |
轻微
[准确性]
"Feishu increases call volume from 1 million to 10 million calls"
理由: Numerical error. Source says '1 万次' (10,000) to '100 万次' (1,000,000). Hypothesis says 1 million to 10 million. | The numbers are misrepresented; the correct figures are 10,000 to 1,000,000 as in the reference. | Source says from 10,000 to 1,000,000 calls; the numbers are incorrectly scaled by a factor of 100. | The numbers are reversed. Source says 1万次 (10,000) changed to 100万次 (1,000,000), but hypothesis states '1 million to 10 million'. |
| Qwen/Qwen3.5-9B | Title: Feishu enters the crayfish interface market; Content: By increasing API calls from 10k to 1M, Feishu has successfully captured this wave of open-source traffic. |
严重
[术语]
"crayfish interface market"
理由: Incorrect translation: '小龙虾' (crayfish) is a nickname for OpenClaw and should be 'OpenClaw API' not 'crayfish interface' per reference. | 小龙虾 is a codename for 'OpenClaw', not 'crayfish'. Reference correctly uses 'OpenClaw API'. | The term 'crayfish interface' is not the most appropriate translation for the context. | Original refers to the OpenClaw API; should use the product name OpenClaw instead of a literal crayfish/interface market phrasing. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: Joining Shopee's Small Prawn Interface Content: Shopee will have 10,000 calls changed to 1,000,000 calls, which is considered as catching this wave of open-source traffic. |
轻微
[准确性]
"Shopee"
理由: Hallucination/Mistranslation. '飞书' is 'Feishu' (Lark), a collaboration tool by ByteDance. 'Shopee' is a completely different e-commerce platform. This changes the subject entirely. | Due to incorrect company identification, the entire meaning is distorted. Should be 'Feishu changed its call limit'. | Incorrect translation of '飞书' which should be 'Feishu'. | Wrong entity; source mentions 飞书 (Feishu/Lark), not Shopee.
严重
[术语]
"Small Prawn Interface"
理由: The hypothesis incorrectly translates '飞书' as 'Shopee' (wrong company entirely) and '小龙虾' as 'Small Prawn' instead of recognizing it as 'OpenClaw'. The reference correctly identifies '飞书' as 'Feishu'. | Mistranslates 小龙虾, which here refers to OpenClaw API, not "Small Prawn Interface" literally. | Incorrect: 'Small Prawn' should be 'OpenClaw' (小龙虾 refers to OpenClaw project) |
| google/gemma-3-12b-it | Title: Feishu Joins the Crawfish Interface Frenzy Content: Feishu is increasing call limits from 10,000 to 1 million calls, effectively catching this wave of open-source traffic. |
轻微
[术语]
"Crawfish"
理由: The term 'Crawfish' is used instead of 'OpenClaw', which might be a nickname for the project. | Should be 'OpenClaw API game' not 'Crawfish Interface Frenzy' - OpenClaw is the product name in source | Uses generic 'Crawfish' instead of the established product/name 'OpenClaw' implied by 小龙虾 in this community context. |
| google/gemma-3-1b-it | Translation of '飞书入局小龙虾接口' in en |
轻微
[准确性]
"Translation of '飞书入局小龙虾接口' in en"
理由: The translation lacks context and does not align with the reference's meaningful title. | Hypothesis is incomplete and does not provide actual translation. It only states 'Translation of...' without delivering the translated content. Reference shows proper translation: 'Feishu enters the OpenClaw API game' | No actual translation provided; just a meta-statement about translating. | The hypothesis is a meta-commentary describing the translation task rather than providing the actual translation. The content is completely missing. |
| google/gemma-3-4b-it | WeChat Entry Shrimp Interface |
致命
[准确性]
"WeChat"
理由: The translation does not accurately reflect the original content, which refers to '飞书' (Feishu) and '小龙虾接口' (OpenClaw API). | Wrong platform; source refers to 飞书 (Feishu/Lark), not WeChat. | Completely incorrect translation. Source mentions '飞书' (Feishu, a collaboration platform) and '小龙虾接口' (OpenClaw API). Hypothesis incorrectly translates as 'WeChat' and 'Shrimp Interface'. | The hypothesis mistranslates '飞书' (Feishu/Lark) as 'WeChat'. It also translates '小龙虾' literally as 'Shrimp' instead of 'OpenClaw' and omits the entire content regarding API call limits. |
| google/translategemma-12b-it | Title: Feishu Integration for Crawfish API Content: Feishu is increasing the number of calls from 10,000 to 1 million, which can be considered as capturing this wave of open-source traffic. |
严重
[准确性]
"Crawfish API"
理由: Mistranslation of '小龙虾'. Similar to Item 3, this is community slang for the 'OpenClaw' project. Translating it as 'Crawfish' loses the specific entity reference. | Original refers to "小龙虾" as the known product "OpenClaw"; "Crawfish" loses the brand/product naming used in reference. | Source refers to '小龙虾' (OpenClaw product), not 'Crawfish'. Should be 'OpenClaw API' not 'Crawfish API'. Also, 'enters the game' is more accurate than 'Integration for'.
轻微
[准确性]
"Feishu is increasing the number of calls from 10,000 to 1 million"
理由: Slight nuance difference: 'increasing' vs 'changed' - reference uses 'changed' which is more accurate as it's a modification of the limit, not necessarily an increase from the user's perspective | Source says '改成' (changed to), not 'increasing'. Reference correctly uses 'changed' which is more accurate than 'increasing'. | The phrase is less precise than the reference's 'changed its call limit'. |
| google/translategemma-4b-it | Title: Wechat Integration - Lobster API |
致命
[准确性]
"Wechat Integration - Lobster API"
理由: The source mentions '飞书' (Feishu), not WeChat. This is a fundamental mistranslation of the main subject. | Misidentifies platform as WeChat instead of Feishu and uses 'Lobster' instead of specific OpenClaw API; also omits key detail about increasing call limits. | The source explicitly says '飞书' (Feishu/Lark). The hypothesis incorrectly translates this as 'Wechat', which is a completely different product.
严重
[术语]
"Lobster API"
理由: Incorrect entity translation - 'Wechat' should be 'Feishu' and 'Lobster' should be 'OpenClaw' | Should be 'OpenClaw API', not 'Lobster API'. The proper entity name is required. | Incorrect translation of 'OpenClaw' as 'Lobster'. |
| tencent/HY-MT1.5-1.8B | Translation of '飞书入局小龙虾接口' in en |
致命
[准确性]
"Translation of '飞书入局小龙虾接口' in en"
理由: Hypothesis is completely unrelated to source - appears to be a placeholder or wrong translation entirely | The hypothesis is a meta-instruction describing a translation task for the wrong language direction (to English) and only covers the title, ignoring the content. | 仅为占位描述,没有给出具体英文翻译,无法对应参考“Feishu enters the OpenClaw API game”。 | Hypothesis is a meta-description of the translation task rather than an actual translation. It should provide the English translation, not describe what should be translated. |
| tencent/HY-MT1.5-7B | Title: Feishu Enters the Dragon Shrimp Interface |
严重
[术语]
"Dragon Shrimp Interface"
理由: Literal mistranslation of '小龙虾'. In this context, it refers to the 'OpenClaw' project, not the animal 'dragon shrimp' or 'crayfish'. | '小龙虾' is the product OpenClaw and should be rendered as OpenClaw/API, not literal 'Dragon Shrimp'. | The term 'Dragon Shrimp' is used instead of 'OpenClaw', which is a critical error as it misrepresents the product. | Hypothesis translates '小龙虾接口' literally as 'Dragon Shrimp Interface' but reference correctly uses 'OpenClaw API' which is the product name | Incorrect entity: '小龙虾' refers to 'OpenClaw' (the AI tool), not 'Dragon Shrimp' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: Share navigation of 88 public utility stations, and ask if there is any Claude's public utility station replacement |
严重
[准确性]
"public utility stations"
理由: Incorrect translation: '公益站' (free API proxy sites) was translated as 'public utility stations' which loses the technical meaning; 'claude平替站' (Claude alternative sites) was incorrectly rendered as 'Claude's public utility station replacement' | Mistranslates “公益站” (free/public-benefit sites, often free proxy/API sites) as literal ‘public utility stations’, which suggests infrastructure utilities rather than websites. | Mistranslation of '公益站'. In this tech context, it refers to 'free API proxy sites' or 'community-run nodes', not physical public utility stations (like water or electric). | Source discusses '公益站' (free API proxy sites), not 'public utility stations'. Also misses context about Antigravity being banned. Incomplete and inaccurate translation |
| CohereLabs/tiny-aya-water | Title: Share navigation of 88 free public proxy sites, and ask if there is any Claude's free proxy site |
严重
[准确性]
"navigation of 88 free public proxy sites"
理由: Title translation incomplete: missing ':fire:' emoji; 'free public proxy sites' is redundant; 'Claude's free proxy site' is grammatically incorrect and loses the meaning of '平替站' (alternative sites) | Source is about a directory of 88 public公益站, specifically API-related, not generic “proxy sites”. | Incomplete translation. Source asks '还有claude平替站么' meaning 'are there any other alternative sites for Claude?' Hypothesis misses the context of alternatives. | The hypothesis omits the translation of the Content section entirely.
轻微
[术语]
"free public proxy sites"
理由: Translation of 'Claude的公益站平替' as 'Claude's free proxy site' is slightly off; should be 'Claude's free proxy alternatives'. | Source says '公益站' (free API proxy sites), not 'public proxy sites'. Reference correctly identifies these as 'free API proxy sites'. | '公益站' in this context refers to free API proxy services provided by the community. 'Public proxy sites' can imply open, unauthenticated proxies which is slightly different, though 'free API proxy sites' (reference) is more precise. |
| Qwen/Qwen2.5-14B-Instruct | Title: Navigation of 88 Free Shared Sites, Seeking Claude Alternatives Content: Anti-gravity was banned, and I found this navigation with 88 free shared sites while browsing forums. Are there any Claude alternatives? |
轻微
[准确性]
"free shared sites"
理由: Slightly imprecise. '公益站' in this context refers to free API proxy nodes/sites. 'Free Shared Sites' is acceptable but 'free API proxy sites' (as in reference) is more accurate to the technical context. | Original refers to "公益站" specifically meaning public-benefit/free proxy-type sites; "shared sites" is vague and omits the proxy/API nuance. | Incorrect translation - 'Free Shared Sites' should be 'free API proxy sites' as per reference | Title misses key context. Should mention 'free API proxy sites' and 'free proxy alternatives that support Claude' per reference.
严重
[准确性]
"Claude alternatives"
理由: Should be 'alternative sites for Claude' or 'free proxy alternatives that support Claude' per reference. Current phrasing is vague. | Source asks for "claude平替站" (alternative sites that can use Claude). "Claude alternatives" could be interpreted as alternative models instead of alternative sites. | Incomplete - should be 'free proxy alternatives that support Claude' per reference |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: Share 88 Public Proxy Navigation, Seeking Alternative for claude\nContent: Anti-gravity is blocked, when looking for alternatives I found the public proxy navigation on the forum, which lists 88 proxies. Wondering if there's an alternative for claude?", "zh-TW": "標題:分享88個公益站導航,尋找能取代claude的公益站\n內容:反重力被封鎖,想找替代時在網路上看到公益站導航,上面列出了88個公益站。想問看是否有能取代claude的公益站?", "ja": "タイトル:88の公益ステーションナビゲーションをシェア、claudeの代替を探す\n内容:反重力がブロックされ、代替を探しているときに掲示板で公益ステーションナビゲーションを見つけて88の公益ステーションがあります。claudeの代替はありますか?", "ko": "제목: 88개의 공익站在韩国语中不能直接翻译为"公益站",因此使用“免费API”作为通用术语。 "ko": "제목: 88개의 무료 API 네비게이션 공유, claude 대체 찾기\n내용: 반중력이 차단되어 대체를 찾고 있을 때 포럼에서 공유된 무료 API 네비게이션을 발견하고 88개의 무료 API가 있습니다. claude 대체는 있나요؟", "de": "Titel: Teile die Navigation zu 88 kostenlosen API-Stellen und suche nach einem Ersatz für claude\nInhalt: Die Antigravitation ist gesperrt, als ich nach einer Alternative suchte, fand ich auf dem Forum eine Navigation zu 88 kostenlosen API-Stellen. Interessiert daran, ob es einen Ersatz für claude gibt?", "fr": "Titre : Partagez la navigation vers 88 stations gratuites et cherchez un remplacement pour claude\nContenu : L’anti-gravitation est bloquée, alors que je cherchais une alternative, j’ai découvert sur le forum une navigation vers 88 stations gratuites. Sachez si il y a une alternative pour claude ?", "it": "Titolo: Condividi la navigazione per 88 API gratuite e cerca un sostituto per claude\nContenuto: La anti-gravità è bloccata, cercando una sostituzione ho trovato sul forum una navigazione per 88 API gratuite. Sapete se c'è un sostituto per claude?"} | 通过 (无共识错误) |
| Qwen/Qwen3-14B | Title: Share 88 Free Shared API Sites Navigation, Looking for Claude Alternatives Content: Anti-gravity is blocked, found this public proxy navigation while looking for alternatives on forums, with 88 free shared API sites. Are there any Claude alternatives? | 通过 (无共识错误) |
| Qwen/Qwen3.5-4B | Title: Sharing 88 free API sites and proxies, plus requests for Calude alternatives Content: Gravity was banned; while looking for an alternative on forums, I discovered this navigation of public proxy/community sites with 88 entries. Do you know any other Claude alternatives? | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | Title: Sharing navigation of 88 free shared API sites, looking for public proxy alternatives that can replace Claude Content: Anti-Gravity was banned; found this navigation while browsing forums when looking for alternatives. It includes 88 free shared API sites. Are there any other Claude alternatives? |
轻微
[术语]
"free shared API sites"
理由: '公益站' in this context specifically refers to free proxy/API nodes provided by the community. 'Free shared API sites' is acceptable but 'free API proxy sites' (as in reference) is more precise regarding the function. | Source implies free/public welfare API proxy sites; 'shared API sites' slightly underspecifies the 'proxy' aspect but remains understandable. | Hypothesis uses 'free shared API sites' while reference uses 'free API proxy sites' - 'shared' is not in source. The source '公益站' means free/public API sites, not necessarily 'shared'. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Title: 88 Free Public Proxy Sites Navigation, Looking for Claude Replacement Sites |
严重
[准确性]
"88 Free Public Proxy Sites Navigation"
理由: The hypothesis provides only a partial translation (title only) and lacks the content translation. It also doesn't capture the context about Antigravity being banned, which is crucial to understanding the post. | Source refers to a directory of 88 free public sites (公益站导航) generally, not explicitly "proxy" sites; adds "Proxy" not present in source. | The hypothesis only provides a translated title and completely omits the content text explaining the context (Antigravity banned, found on forum) and the specific question about Claude alternatives. | Incomplete translation: Only title translated, content missing; '公益站' should be 'free API proxy sites' not 'public proxy sites'; missing context about Antigravity being banned and forum browsing |
| google/gemma-3-12b-it | Title: Sharing navigation to 88 free API sites, also looking for alternatives to free API sites that can be used with Claude. Content: Anti-gravity was blocked, and I found a navigation list of free API sites while browsing forums when looking for alternatives. There are 88 free API sites. I wonder if there are any other alternative sites that can be used with Claude? | 通过 (无共识错误) |
| google/gemma-3-1b-it | Share 88 public API sites navigation, and find alternatives to calude's public API sites. |
严重
[准确性]
"public API sites"
理由: The hypothesis omits the context provided in the source content about 'Antigravity' being banned and the discovery process on the forum. | Misspelling 'calude' instead of 'Claude'. Incomplete translation missing the context about Antigravity being banned and the forum discovery. Content translation is missing entirely. | Source refers to "公益站" (free/proxy sites), not specifically "API sites"; also omits the forum/browsing and Antigravity-banned context and the count "88" as a directory. |
| google/gemma-3-4b-it | "en": "Title: Share 88 Public Proxy Navigation Sites, and Looking for Alternatives to Calude’s Public Proxy Sites”, Content: Anti-gravity was banned, and when browsing forums after finding public proxy sites navigation, there were 88 public proxy sites. Would like to know if there are any Claude alternative proxy sites?", "zh-TW": "標題:分享88個公益站導航,尋找Calude公益站替代站”,內容:反重力被封鎖,在瀏覽論壇後發現的公益站導航,共有88個公益站。想知道還有Calude的替代站嗎?", "ja": "タイトル:88の公益サイトナビゲーションを共有、Caludeの公益サイト代替を探す”, “コンテンツ:反重力の機能が停止されました。フォーラムで公開プロキシサイトのナビゲーションを見つけたところ、88の公益サイトがありました。Caludeの代替サイトはありますか?", "ko": "제목: 88개의 공공 프록시 탐색 공유, 클라우드의 공공 프록시 대체 찾기”, “내용: 반중력 기능이 차단되었고, 포럼에서 공개 프록시 사이트 탐색을 발견하여 88개의 공공 프록시 사이트가 있었습니다. 클라우드 대체 프록시 사이트는 있습니까?", "de": "Titel: Teilen Sie 88 öffentliche Proxy-Navigationsseiten, suchen Sie nach Alternativen zu Cloud-öffentlichen Proxy-Sites”, “Inhalt: Die Funktion von Anti-Gravity wurde gesperrt. Nach dem Durchsuchen von Foren wurden 88 öffentliche Proxy-Sites gefunden. Gibt es Cloud-Alternative Proxy-Sites?", "fr": "Titre : Partagez 88 navigateurs de sites de proxy publics, recherchez des alternatives aux sites de proxy publics de Cloud”, “Contenu : La fonction anti-gravité a été bloquée. Après avoir parcouru les forums, 88 sites de proxy publics ont été découverts. Existe-t-il des sites de proxy alternatifs de Cloud ?”, "it": "Titolo: Condividi 88 siti di navigazione proxy pubblici, cerca alternative ai siti di proxy pubblici di Cloud”, “Contenuto: La funzione anti-gravità è stata bloccata. Dopo aver sfogliato i forum, sono stati trovati 88 siti di navigazione proxy pubblici. Esistono alternative ai siti di proxy pubblici di Cloud?” } <|im_end|> | 通过 (无共识错误) |
| google/translategemma-12b-it | Title: Sharing a navigation list of 88 public API sites, looking for alternatives that work with Claude. |
轻微
[准确性]
"public API sites"
理由: Hypothesis only provides title, missing content translation. Reference provides both title and content. Also, '公益站' means 'free proxy sites' not 'public API sites'. | Incomplete translation. The hypothesis only translates the Title and omits the Content section ('反重力被封,想找替代时逛论坛发现的公益站导航,有 88 个公益站。想知道还有 claude 平替站么?'). | The term 'public' is less precise than the reference's 'free API proxy'. | Source "公益站导航" means public-welfare/free sites directory, not necessarily API-specific; "API" adds unwarranted specificity.
严重
[术语]
"public API sites"
理由: '公益站' in this context refers to 'free proxy sites' or 'charity/free nodes' provided by the community, not necessarily 'public API sites' which implies an official public interface. 'Free proxy sites' (as in reference) is more accurate. | Should be 'free API proxy sites' or 'free proxy sites' - these are proxy services, not public APIs | Inaccurate translation: 'public API sites' should be 'free API proxy sites' - the source refers to '公益站' (public welfare sites/free proxy sites), and reference uses 'free API proxy sites' |
| google/translategemma-4b-it | Title: Sharing 88 Public Proxy Sites, and looking for alternatives to calude's public proxy sites. |
严重
[准确性]
"Public Proxy Sites"
理由: Typo: 'calude' should be 'Claude' | Source talks about '公益站导航' and free/public sites in general, not specifically 'public proxy sites' only. | The hypothesis only translates the title and omits the content entirely. | The hypothesis omits the content section explaining the context (Antigravity banned) and the specific question about Claude alternatives. |
| tencent/HY-MT1.5-1.8B | Translation of 'Title: 分享88个公益站导航,再求能用calude的公益站平替' in en |
致命
[准确性]
"Translation of 'Title: 分享88个公益站导航,再求能用calude的公益站平替' in en"
理由: Hypothesis is completely unrelated to source - appears to be a placeholder or wrong translation entirely | The hypothesis is a meta-instruction describing a translation task for the wrong language direction (to English) and only covers the title, ignoring the content. | 只是一句说明“Translation of ... in en”,没有实际翻译,缺失标题中“分享…导航”“公益站”“平替”“Claude”等关键信息。 | Hypothesis is a meta-description of the translation task rather than an actual translation. It should provide the English translation, not describe what should be translated. |
| tencent/HY-MT1.5-7B | Share navigation links for 88 public welfare sites and see if any can replace Claude’s public welfare sites. |
严重
[准确性]
"Share navigation links for 88 public welfare sites"
理由: The hypothesis omits the context that 'Antigravity was banned' which prompted the search. It also mistranslates '公益站' as 'public welfare sites' (charity) instead of 'free API proxy sites' in this technical context. | Title phrasing is somewhat close but omits that these are API/proxy-like '公益站导航'; also does not convey that this is a directory discovered while searching for alternatives. | Hypothesis translates '公益站' as 'public welfare sites' but reference correctly contextualizes as 'free API proxy sites' |
| 待测模型 | 机器翻译结果 (Hypothesis) | 仲裁结论 (Accepted Errors) |
|---|---|---|
| CohereLabs/tiny-aya-global | Title: :fire:** Save Money Series 6** L Station Google/Antigravity/Gemini Ultra Member Carpooling Experience |
严重
[准确性]
"L Station"
理由: Source title should translate to '🔥 [Money-Saving Series 6] LINUX DO Forum Google/Antigravity/Gemini Ultra Subscription Sharing Experience'. Hypothesis uses 'L Station' instead of 'LINUX DO Forum' and 'Carpooling' instead of 'Subscription Sharing' | Failure to translate the specific community reference 'L 站'. It refers to 'LINUX DO' or a specific forum, not a physical station. | Under-specific translation of “L站”, which refers to a particular forum (LINUX DO); loses some referential clarity compared to the reference.
严重
[准确性]
"Member Carpooling Experience"
理由: Missing content: The reference shows the full title includes '[Money-Saving Series 6]' which is missing in hypothesis; content body is entirely omitted | Omission of 'Subscription Sharing' from the reference | Should be 'Subscription Sharing Experience', not 'Carpooling Experience' | “拼车” in subscription context means sharing a subscription/account, not literal carpooling; this mistranslates the key concept. |
| CohereLabs/tiny-aya-water | Title: :fire:** Save Money Series 6** L Station Google/Antigravity/Gemini Ultra Member Carpooling Experience |
严重
[术语]
"L Station"
理由: 'L 站' refers to 'LinuxDo' (a specific forum). 'L Station' is a confusing literal translation that loses the proper noun identity. | Source refers to “L站” (a specific forum, e.g., Linux Do); “L Station” is vague and nonstandard but partially understandable. | Incorrect terminology. 'L Station' should be 'LINUX DO Forum'. Reference correctly identifies this. | Incorrect entity name: 'L Station' should be 'LINUX DO Forum' (the platform name) |
| Qwen/Qwen2.5-14B-Instruct | "en": "Title: :fire:[Money-Saving Series 6] Carpooling Experience for L Station Google/Antigravity/Gemini Ultra Members\nContent: This is just a common situation and does not represent uniform rules; second-hand transactions follow the seller's agreement. The following prices may fluctuate by 10-20%, depending on exchange rates, supply and demand, and the car owner's assessment of risk and time cost. Cost price is rare and should not be overly focused on; the car owner has no obligation to sell at cost price.", "zh-TW": "標題::fire:[省錢系列6] L站Google/Antigravity/Gemini Ultra會員拼車經驗\n內容:這只是常見情況,不代表統一規則;二手交易依照賣家約定。以下價格可能因匯率、供應與需求及車主對風險和時間成本評估而上下波動10-20%,成本價難得,不必過於執著,車主沒有義務一定要按成本價出售。", "ja": "タイトル::fire:[節約シリーズ6] LステーションGoogle/Antigravity/Gemini Ultra会員のカープール経験\nコンテンツ:これは一般的な状況であり、一律のルールを示しているわけではありません;中古取引は売主の合意に従います。以下の価格は為替レート、需給関係、および運転手がリスクと時間コストを評価した結果、10〜20%程度変動することがあります。原価は稀で、あまり気にする必要はありません;ドライバーには必ずしも原価で販売する義務はありません。", "ko": "제목: :fire:[절약 시리즈 6] L스테이션 Google/Antigravity/Gemini Ultra 회원 캐러пу르 경험\n내용: 이는 일반적인 상황이며 일률적인 규칙을 나타내는 것은 아닙니다; 중고 거래는 판매자의 약정에 따릅니다. 아래의 가격은 환율, 공급과 수요, 그리고 차주가 위험과 시간 비용을 평가한 결과로 10~20% 정도 변동될 수 있습니다. 원가는 드물며 너무 집착하지 마세요; 차주는 반드시 원가로 판매할 의무가 없습니다.", "de": "Titel: :fire:[Sparsamkeit Serie 6] Mitfahrgelegenheiten für L-Station-Mitglieder bei Google/Antigravity/Gemini Ultra\nInhalt: Dies ist nur eine häufige Situation und stellt keine allgemeinen Regeln dar; Gebrauchtwarentransaktionen unterliegen den Vereinbarungen des Verkäufers. Die folgenden Preise können je nach Wechselkurs, Nachfrage und Angebot sowie der Bewertung durch den Fahrer von Risiko und Zeitkosten um bis zu 10-20 % variieren. Der Kostenpreis ist selten und sollte nicht übermäßig wichtig sein; der Fahrer hat keinen Pflichten, den Kostenpreis einzuhalten.", "fr": "Titre: :fire:[Série Économie d'argent 6] Expérience de covoiturage pour les membres de la station L Google/Antigravity/Gemini Ultra\nContenu: C'est une situation courante et ne représente pas un ensemble de règles uniformes ; les transactions d'occasion suivent l'accord du vendeur. Les prix ci-dessous peuvent varier de 10 à 20 % selon le taux de change, l'offre et la demande ainsi que l'évaluation par le propriétaire de la voiture du risque et du coût temporel. Le prix de revient est rare et il n'est pas nécessaire de s'y attacher trop ; le propriétaire de la voiture n'a pas l'obligation de vendre au prix de revient.", "it": "Titolo: :fire:[Serie per risparmiare 6] Esperienza di condivisione della macchina per i membri della stazione L Google/Antigravity/Gemini Ultra\nContenuto: Questo è solo una situazione comune e non rappresenta regole uniformi; le transazioni usate seguono l'accordo del venditore. I seguenti prezzi possono variare fino al 10-20% a causa dei tassi di cambio, dell'offerta e della domanda e dell'attitudine del proprietario della macchina nei confronti del rischio e del costo temporale. Il prezzo di costo è raro e non dovrebbe essere preso troppo sul serio; il |
严重
[准确性]
"L Station"
理由: Should be 'LINUX DO Forum' per reference, not 'L Station'. | Within the English portion of the erroneous multi-language output, 'L 站' is mistranslated as 'L Station'. In this community context, it refers to 'LINUX DO Forum' (or simply 'L Forum'), not a physical station. | Incorrect entity - 'L Station' should be 'LINUX DO Forum' as per reference | Title refers to "L站" which in reference is "LINUX DO Forum"; "L Station" is a literal but misleading rendering of the forum name. |
| Qwen/Qwen2.5-7B-Instruct | "en": "Title: 🔥【Save Money Series 6】L Station Google/Antigravity/Gemini Ultra Membership Carpooling Experience\nContent: This is just a common situation and does not represent unified rules. Second-hand transactions are subject to the seller's agreement. Prices may fluctuate based on ① exchange rate ② supply and demand relationship ③ the car owner's assessment of risk and time cost, with an increase of 10-20% being normal. Cost price is rare, so don't overthink it; the driver has no obligation to sell at the cost price.", "zh-TW": "Title: 🔥【省錢系列6】L站Google/Antigravity/Gemini Ultra會員共乘經驗\nContent: 此僅為常見情形,不代表統一規則。二手交易以賣家約定為準。價格可能因①匯率②供給與需求關係③車主對風險及時間成本的評估而上下波動,漲幅10-20%屬正常範圍。成本價難得一見,勿過於斤斤計較;車主沒有義務一定要按成本價售出", "ja": "タイトル:🔥【節約シリーズ6】LステーションGoogle/Antigravity/Gemini Ultraメンバーシップカープーリング体験\n内容:これは一般的な状況であり、統一ルールを代表するものではありません。二度目の取引は販売者の約束に基づきます。価格は①為替レート②供給と需要の関係③ドライバーがリスクと時間コストをどのように評価するかにより変動し、10〜20%の上昇も通常です。コスト価格は珍しいので、あまり気にしないでください;ドライバーには必ずコスト価格で売る義務はありません", "ko": "제목: 🔥【돈 절약 시리즈 6】L 스테이션 Google/Antigravity/Gemini Ultra 멤버십 카풀 경험\n내용: 이는 일반적인 상황이며 일관된 규칙을 대표하지 않습니다. 두 번째 거래는 판매자의 약속에 따라 이루어집니다. 가격은 ① 환율 ② 공급과 수요 관계 ③ 운전자가 위험과 시간 비용을 어떻게 평가하는지에 따라 변동하며, 10~20%의 상승도 정상입니다. 원가를 지키는 것은 드물므로 너무 신경 쓰지 마세요; 운전자는 반드시 원가로 팔아야 할 의무는 없습니다", "de": "Titel: 🔥【Ersparnis-Reihe 6】L-Station Google/Antigravity/Gemini Ultra Mitglieder-Carpool-Erfahrung\nInhalt: Dies ist nur ein typisches Beispiel und repräsentiert keine einheitlichen Regeln. Zweite Hand-Transaktionen sind nach den Vereinbarungen des Verkäufers zu verstehen. Die Preise können aufgrund von ① Wechselkurs ② Angebot und Nachfrage ③ der Einschätzung des Fahrers für Risiko und Zeitkosten um 10 bis 20 % fluktuieren, was auch als normal angesehen werden kann. Kostenpreise sind selten, also denke nicht zu sehr darüber nach; der Fahrer hat keine Pflicht, die Kostenpreise zu verkaufen.", "fr": "Titre: 🔥【Série d'économies 6】Expérience de covoiturage membre L Station Google/Antigravity/Gemini Ultra\nContenu: C'est simplement une situation courante et ne représente pas les règles unifiées. Les transactions de seconde main sont soumises aux accords du vendeur. Les prix peuvent varier en fonction de ① le taux de change ② l'offre et la demande ③ l'évaluation par le conducteur du risque et du coût du temps, avec une augmentation de 10 à 20 % étant considérée comme normale. Le prix de coûts est rare, donc n'en faites pas trop d'un point de vue financier ; le conducteur n'a pas l'obligation de vendre au prix de coûts.", "it": "Titolo: 🔥【Serie di Economia 6】Esperienza di Carpooling Membro L Station Google/Antigravity/Gemini Ultra\nContenuto: Questo è solo una situazione comune e non rappresenta regole uniformi. Le transazioni secondarie sono |
轻微
[术语]
"L Station"
理由: Source "L站" refers to the LINUX DO forum; "L Station" is understandable but less precise than "LINUX DO Forum" in this context. | Should be 'LINUX DO Forum' - 'L站' is short for LinuxDo forum | Imprecise translation: 'L 站' refers to 'LinuxDo Forum'. 'L Station' is an unnatural literal translation that loses the specific community reference. |
| Qwen/Qwen3-14B | :fire: [Série d'économies 6] Expérience de covoiturage pour membres L Station Google/Antigravity/Gemini Ultra |
严重
[准确性]
"L Station"
理由: Source "L站" refers to a specific forum (Linux DO); "L Station" is misleading and not aligned with reference. | 'L 站' refers to 'Linux Do' (a specific forum community). 'L Station' is a literal and confusing translation that loses the community context. | Hypothesis is in French, but target language is zh (Chinese). Should be translated to Chinese, not French. |
| Qwen/Qwen3.5-4B | Title: :fire: [Save Money Series 6] Carpooling Experience for L Station Google/Antigravity/Gemini Ultra Members Content: This is just a common situation and does not represent unified rules. Second-hand transactions are subject to seller agreements. The following prices fluctuate based on ① exchange rates, ② supply and demand, ③ valuation of risk and time costs by the vehicle owner. A 10-20% increase is also normal. Cost price is rare; do not obsess over it. Vehicle owners have no obligation to sell at cost. | 通过 (无共识错误) |
| Qwen/Qwen3.5-9B | :fire: [Saving Series 6] L-Station Google/Antigravity/Gemini Ultra Membership Carpooling Experience |
严重
[术语]
"L-Station"
理由: Should reflect the community/forum name more precisely (e.g., LINUX DO forum) rather than an invented term 'L-Station'. | L站 refers to 'LINUX DO Forum', not 'L-Station'. Reference correctly identifies as 'LINUX DO Forum'. | Critical error: 'L站' refers to 'LINUX DO Forum' (a specific forum), not 'L-Station'. This is a major mistranslation of the entity name.
轻微
[术语]
"Membership Carpooling Experience"
理由: 'Carpooling' is understandable slang for subscription sharing but deviates slightly from more standard 'subscription sharing'; meaning is preserved. | 'Membership' should be 'Subscription' per reference. Also 'Carpooling' is used metaphorically for account sharing but reference uses 'Subscription Sharing' which is more accurate. | '会员' in the context of digital services is often 'Subscription' rather than 'Membership', though both are acceptable. 'Subscription' fits the 'sharing' context better. |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Translation of ':fire:【省钱系列6】L站Google/Antigravity/Gemini Ultra会员拼车经验' |
致命
[准确性]
"Translation of ':fire:【省钱系列6】L站Google/Antigravity/Gemini Ultra会员拼车经验'"
理由: No actual translation is provided, only a meta description; fails to translate the title content itself. | The hypothesis does not provide an actual translation, only a meta-statement about translating. No content translation is provided, making this a complete failure to deliver the required translation. | The hypothesis is a meta-statement describing the task ('Translation of...') rather than providing the actual translation of the source text. It fails to translate the title or the content. | Not a translation - just states 'Translation of' followed by source text; entire content missing |
| google/gemma-3-12b-it | "en": "Title: 🔥[Saving Series 6] L Station Google/Antigravity/Gemini Ultra Membership Carpool Experience\nContent: This is just a common situation and does not represent a unified rule. It’s a secondhand transaction, subject to agreement between the seller and buyer. The following prices fluctuate based on ① exchange rates ② supply and demand ③ the car owner's assessment of risk and time cost, and an increase of 10~20% is considered normal. Cost price is rare, so don't be too fixated on it; the car owner has no obligation to sell at cost.", "zh-TW": "標題:🔥[省錢系列6] L站Google/Antigravity/Gemini Ultra會員拼車經驗\n內容:這只是一般情況,並不代表統一規則。屬於二手交易,以賣家約定為準。以下價格會根據①匯率、②供需關係、③車頭對風險和時間成本的評估而波動,上漲10~20%也算正常。成本價難得一見,不要太過糾結,車主沒有義務一定要以成本價出售。", "ja": "タイトル:🔥[節約シリーズ6] LステーションGoogle/Antigravity/Gemini Ultra会員権カープール体験\n内容:これは一般的な状況であり、統一されたルールを表すものではありません。中古取引であり、売り手と買い手の合意に基づきます。以下の価格は、①為替レート、②需要と供給、③カーオーナーがリスクと時間コストを評価し、10〜20%の上乗せも一般的です。原価はまれなので、こだわりすぎないようにしましょう。カーオーナーは必ずしも原価で販売する義務はありません。", "ko": "제목: 🔥[절약 시리즈 6] L 스테이션 Google/Antigravity/Gemini Ultra 멤버십 카풀 경험\n내용: 이는 일반적인 상황일 뿐이며 통일된 규칙을 나타내는 것은 아닙니다. 중고 거래이므로 판매자와 구매자의 합의에 따릅니다. 다음 가격은 ①환율, ②수요와 공급, ③차주가 위험과 시간 비용을 평가하여 10~20% 인상도 정상입니다. 원가는 드물기 때문에 너무 애쓰지 마세요. 차주는 반드시 원가로 판매할 의무는 없습니다.", "de": "Titel: 🔥[Sparreihe 6] L Station Google/Antigravity/Gemini Ultra Mitgliedschaft Fahrgemeinschaftserfahrung\nInhalt: Dies ist nur eine übliche Situation und stellt keine einheitliche Regel dar. Es handelt sich um einen Gebrauchtartikelkauf, der dem Abkommen zwischen Verkäufer und Käufer unterliegt. Die folgenden Preise schwanken je nach ① Wechselkursen, ② Angebot und Nachfrage sowie ③ der Einschätzung von Risiko und Zeitkosten durch den Fahrer, wobei eine Erhöhung von 10–20 % als normal gilt. Der Einkaufspreis ist selten, daher sollten Sie sich nicht zu sehr darauf versteifen; der Fahrer ist nicht verpflichtet, zum Einkaufspreis zu verkaufen.", "fr": "Titre : 🔥[Série d’économies 6] Expérience de covoiturage pour les membres Google/Antigravity/Gemini Ultra sur L Station\nContenu : Il s’agit uniquement d’une situation courante et ne représente pas une règle uniforme. Il s’agit d’une transaction de seconde main, soumise à l’accord entre le vendeur et l’acheteur. Les prix suivants varient en fonction de ① des taux de change, ② de l’offre et de la demande, et ③ de l’évaluation du risque et du coût du temps par le chauffeur, et une augmentation de 10 à 20 % est considérée comme normale. Le prix de revient est rare, alors ne vous y attachez pas trop ; le chauffeur n’est pas obligé de vendre au prix de revient.", "it": "Titolo: 🔥[Serie Risparmio 6] Esperienza di Carpooling per i Membri Google/Antigravity/Gemini Ultra su L Station\nContenuto: Questa è solo una situazione comune e non rappresenta una regola unificata. Si tratta di una transazione di seconda mano, soggetta all'accordo tra venditore e acquirente. I seguenti prezzi variano in base a ① tassi di cambio, ② domanda e offerta, ③ valutazione del rischio e dei costi di tempo da parte del conducente, ed un aumento del 10-20% è considerato normale. Il costo di acquisto è raro, quindi non fissatevi troppo; il conducente non ha l'obbligo di vendere al costo | 通过 (无共识错误) |
| google/gemma-3-1b-it | This is just common experience, not a uniform rule. Used car transactions, prices are agreed upon by seller. Below price varies based on ① exchange rate ② demand ③ vehicle valuation, up to 10%-20% is considered normal. Cost price is unavailable and difficult to find, don't overthink it, the owner has no obligation to sell at cost price. | 通过 (无共识错误) |
| google/gemma-3-4b-it | Title: 🔥【Saving Series 6】L Station Google/Antigravity/Gemini Ultra Ride Sharing Experience | 通过 (无共识错误) |
| google/translategemma-12b-it | "en": "Title: :fire: [Cost-Saving Series 6] Sharing Experience for L Platform Google/Antigravity/Gemini Ultra Memberships\nContent: This is just a common situation and does not represent a unified rule. It's a second-hand transaction, so follow the agreement between the buyer and seller. The following prices fluctuate based on ① exchange rates, ② supply and demand, ③ the seller's assessment of risks and time costs, and an increase of 10~20% is considered normal. Cost prices are rare, so don't be too concerned. Sellers are not obligated to sell at cost.", "zh-TW": "標題::fire: 【省錢系列6】L平台 Google/Antigravity/Gemini Ultra 會員共享經驗\n內容:這只是一般情況,並不代表統一規則。屬於二手交易,請依照買賣雙方約定為準。以下價格會根據①匯率、②供需關係、③車頭對風險和時間成本的評估而波動,上漲10~20%也是正常的。原價難得一見,不必過於糾結,車主沒有義務一定按照原價出售。", "ja": "タイトル: :fire: 【節約シリーズ6】LプラットフォームのGoogle/Antigravity/Gemini Ultra会員共有体験\n内容: これは一般的な状況であり、統一されたルールを表すものではありません。中古取引であるため、購入者と販売者の合意に従ってください。以下の価格は、①為替レート、②需要と供給、③販売者がリスクと時間コストを評価した結果によって変動します。10〜20%の上乗せも一般的です。原価はまれなので、あまり気にしないでください。販売者は必ず原価で販売する義務はありません。", "ko": "제목: :fire: [절약 시리즈 6] L 플랫폼의 Google/Antigravity/Gemini Ultra 멤버십 공유 경험\n내용: 이는 일반적인 상황이며, 통일된 규칙을 의미하지 않습니다. 중고 거래이므로, 구매자와 판매자의 합의에 따릅니다. 아래 가격은 ①환율, ②수요와 공급, ③판매자가 위험과 시간 비용을 평가한 결과에 따라 변동합니다. 10~20% 인상도 일반적입니다. 원가는 구하기 어려우므로 너무 신경 쓰지 마세요. 판매자는 반드시 원가로 판매할 의무는 없습니다.", "de": "Titel: :fire: [Sparreihe 6] Erfahrungsaustausch zum Teilen von Mitgliedschaften bei L Platform für Google/Antigravity/Gemini Ultra\nInhalt: Dies ist nur eine typische Situation und stellt keine einheitliche Regel dar. Es handelt sich um einen Gebrauchtartikelkauf, daher gelten die Vereinbarungen zwischen Käufer und Verkäufer. Die folgenden Preise variieren je nach ① Wechselkurs, ② Angebot und Nachfrage sowie ③ der Einschätzung des Verkäufers hinsichtlich Risiken und Zeitkosten. Eine Erhöhung von 10–20 % ist üblich. Der Einkaufspreis ist selten zu finden, also machen Sie sich nicht zu viele Gedanken darüber. Verkäufer sind nicht verpflichtet, zu den Einkaufspreisen zu verkaufen.", "fr": "Titre : :fire: [Série Économies 6] Expérience partagée pour les abonnements Google/Antigravity/Gemini Ultra sur la plateforme L\nContenu : Il s'agit uniquement d'une situation courante et ne représente pas une règle uniforme. Il s'agit d'une transaction de seconde main, donc suivez l'accord entre l'acheteur et le vendeur. Les prix suivants varient en fonction de ① des taux de change, ② de l'offre et de la demande, et ③ de l'évaluation par le vendeur des risques et des coûts liés au temps. Une augmentation de 10 à 20 % est considérée comme normale. Les prix de revient sont rares, alors ne vous inquiétez pas trop. Les vendeurs ne sont pas obligés de vendre au prix de revient.", "it": "Titolo: :fire: [Serie Risparmio 6] Condivisione di esperienze per gli abbonamenti Google/Antigravity/Gemini Ultra sulla piattaforma L\nContenuto: Questa è solo una situazione comune e non rappresenta una regola univoca. Si tratta di una transazione di seconda mano, quindi seguire l'accordo tra acquirente e venditore. I seguenti prezzi variano in base a ① tassi di cambio, ② domanda e offerta, e ③ valutazione da parte del venditore dei rischi e dei costi legati al tempo. Un aumento del 10-20% è considerato normale. I prezzi di costo |
轻微
[准确性]
"L Platform"
理由: Hypothesis provides only English translation, but source is Chinese and reference indicates this should be 'LINUX DO Forum' not 'L Platform'. Also missing other language versions context. | 'L 站' is specific slang for 'Linux Do' (a forum). The reference correctly identifies it as 'LINUX DO Forum'. 'L Platform' is too vague and loses the specific community reference. | Source "L站" is a specific forum/site (LINUX DO in reference). "Platform" is generic and loses that nuance, though meaning is broadly understandable. |
| google/translategemma-4b-it | Title: 🔥【Saving Series 6】L Station Google/Antigravity/Gemini Ultra Carpooling Experience |
严重
[准确性]
"L Station"
理由: 'L 站' is a specific nickname for 'LINUX DO Forum'. Translating it as 'L Station' loses the specific community reference understood by the target audience. | The hypothesis only translates the title and omits the content entirely. | Likely refers to 'LINUX DO' forum (L站) rather than generic 'L Station', losing specific reference. |
| tencent/HY-MT1.5-1.8B | Translation of '拼车' in en |
致命
[准确性]
"Translation of '拼车' in en"
理由: The hypothesis is a meta-instruction describing a translation task for a single word ('拼车') in the wrong language direction (to English), failing to translate the full source title and content into Chinese. | Hypothesis is completely unrelated to source - appears to be a placeholder or wrong translation entirely | Hypothesis is a meta-description of the translation task rather than an actual translation. It should provide the English translation of '拼车' (subscription sharing/ride-sharing), not describe what should be translated. | 只给出占位文本,未将语境中的“拼车”(合购/拼单共享订阅)实际翻译出来。 |
| tencent/HY-MT1.5-7B | "en": "Title: :fire:[Saving Money Series 6] Sharing Car Riding Experiences for L Station Google/Antigravity/Gemini Ultra Members", "zh-TW": "Title: :fire:[Saving Money Series 6] Sharing Car Riding Experiences for L Station Google/Antigravity/Gemini Ultra Members", "ja": "Title: :fire:[Saving Money Series 6] Sharing Car Riding Experiences for L Station Google/Antigravity/Gemini Ultra Members", "ko": "Title: :fire:[Saving Money Series 6] Sharing Car Riding Experiences for L Station Google/Antigravity/Gemini Ultra Members", "de":="Titel: :fire:[Sparsamkeitsreihe 6] Teilung von Erfahrungen mit gemeinsamen Fahrten für L Station Google/Antigravity/Gemini Ultra Mitglieder", "fr": "Titre: :fire:[Série d’économies 6] Partage d’expériences de co-voiturage pour les membres de L Station Google/Antigravity/Gemini Ultra", "it": "Titolo: :fire:[Serie di risparmio 6] Condivisione di esperienze di carpooling per i membri di L Station Google/Antigravity/Gemini Ultra"} |
严重
[术语]
"L Station"
理由: Hypothesis uses 'L Station' but reference correctly identifies this as 'LINUX DO Forum' | Incorrect entity: 'L站' refers to 'LINUX DO Forum', not 'L Station' | The reference specifies 'LINUX DO Forum'; rendering as 'L Station' loses the specific forum/brand name. | 'L 站' refers to 'LINUX DO Forum'. 'L Station' is an unnatural literal translation. |
Loading divergence cases...
For diagnostic use only.