Hier ist auch schön zu erkennen, wie sich die Dateigröße bei UTF-16 verdoppelt.
Bei UTF-8 dagegen nur minimaler Zuwachs, je nach Anzahl der Sonderzeichen (1051493 Byte bei ANSI bzw. OEM, 1051583 bei UTF-8)
Diese und weitere Nummern der Codepages für andere Konvertierungen: Code Page Identifiers
Diese Nummern werden im Skript in den beiden Zeilen mit $sourceEncoding= bzw. $targetEncoding= eingesetzt.
Die gängigen Codepages bzw. Unicodenummern sind in den Skripten unten enthalten, andere dort nach Bedarf eintragen. Wie bei allen Konvertierungen muss beachtet werden, dass Codepages nur einen begrenzten Zeichenvorrat von meist 256 Zeichen haben, und fehlende Zeichen (d.h. Zeichen in der Sourcecodepage, die in der Targetcodepage nicht vorhanden sind) je nach Konverter als Fragezeichen, Leerstelle,Unterstrich o.a., dargestellt werden. In Unicode kann dagegen jede Codepage konvertiert werden, umgekehrt funktioniert es aber auch nur dann vollständig, wenn die Codepage des Targetencodings alle Zeichen der Unicodequelle aufnehmen und darstellen kann.
Die Konvertierung wird via .NET-Framework durchgeführt:
Encoding.GetEncoding Method (String)
File.ReadAllText Method (String, Encoding)
File::WriteAllText Method (String, String, Encoding)
EncodingInfo.Name Property
Sonstige Links
Character Encoding
International Components for Unicode (ICU)
.NET Source Browser
OEM850 nach UTF-8
- Code: Alles auswählen
Function Convert_OEM850_UTF8
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(850)
$targetEncoding = [System.Text.Encoding]::GetEncoding(65001)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_UTF8" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
Neu 04.10.2017 OEM850 nach UTF-8 mit "Datei öffnen"-Dialogfenster
- Code: Alles auswählen
Function Convert_OEM850_UTF8_Dlg
{
$ErrorActionPreference = "Stop"
$PSDefaultParameterValues['*:ErrorAction']='Stop'
Function Get-FileName($initialDirectory)
{
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null
$OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog
$OpenFileDialog.initialDirectory = $initialDirectory
$OpenFileDialog.filter = "TXT (*.txt)|*.txt|CSV (*.csv)|*.csv|All files|*.*"
$OpenFileDialog.ShowDialog() | Out-Null
$OpenFileDialog.filename
}
$inputfile = Get-FileName -initialDirectory "$env:HOMEDRIVE\temp"
if ($inputfile -eq "") {throw 'Please select a file'}
$WorkingFolder = Split-Path -Parent $inputfile
$sourceEncoding = [System.Text.Encoding]::GetEncoding(850)
$targetEncoding = [System.Text.Encoding]::GetEncoding(65001)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($inputfile) +"_UTF8" + [System.IO.Path]::GetExtension($inputfile)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($inputfile, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $inputfile 'converted to' $convertedFileName
}
OEM852 nach UTF-8 mit "Datei öffnen"-Dialogfenster (nur für Zentral-/Osteuropa (Central Eastern Europe (PL, CZ, RO, HU, RS, SK etc.))
- Code: Alles auswählen
Function Convert_OEM852_UTF8_Dlg
{
$ErrorActionPreference = "Stop"
$PSDefaultParameterValues['*:ErrorAction']='Stop'
Function Get-FileName($initialDirectory)
{
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null
$OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog
$OpenFileDialog.initialDirectory = $initialDirectory
$OpenFileDialog.filter = "TXT (*.txt)|*.txt|CSV (*.csv)|*.csv|All files|*.*"
$OpenFileDialog.ShowDialog() | Out-Null
$OpenFileDialog.filename
}
$inputfile = Get-FileName -initialDirectory "$env:HOMEDRIVE\temp"
if ($inputfile -eq "") {throw 'Please select a file'}
$WorkingFolder = Split-Path -Parent $inputfile
$sourceEncoding = [System.Text.Encoding]::GetEncoding(852)
$targetEncoding = [System.Text.Encoding]::GetEncoding(65001)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($inputfile) +"_UTF8" + [System.IO.Path]::GetExtension($inputfile)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($inputfile, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $inputfile 'converted to' $convertedFileName
}
OEM850 nach UTF-16 (GetEncoding(1200) = Litte-Endian, für Big-Endian statt dessen GetEncoding(1201))
- Code: Alles auswählen
Function Convert_OEM850_UTF16
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(850)
$targetEncoding = [System.Text.Encoding]::GetEncoding(1200)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_UTF16" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
Edit 04.01.2017: Für UTF-16 ohne BOM (Byte Order Mark) siehe Variante hier.
OEM850 nach ANSI1252
- Code: Alles auswählen
Function Convert_OEM850_ANSI1252
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(850)
$targetEncoding = [System.Text.Encoding]::GetEncoding(1252)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_ANSI1252" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
OEM850 nach ISO 8859-1
- Code: Alles auswählen
Function Convert_OEM850_ISO88591
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(850)
$targetEncoding = [System.Text.Encoding]::GetEncoding(28591)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_ISO8859_1" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
ANSI1252 nach UTF-8
- Code: Alles auswählen
Function Convert_ANSI1252_UTF8
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(1252)
$targetEncoding = [System.Text.Encoding]::GetEncoding(65001)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_UTF8" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
ISO 8859-1 nach UTF-8
- Code: Alles auswählen
Function Convert_ISO88591_UTF8
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(28591)
$targetEncoding = [System.Text.Encoding]::GetEncoding(65001)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_UTF8" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
ISO 8859-1 nach ANSI1252
- Code: Alles auswählen
Function Convert_ISO88591_ANSI1252
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(28591)
$targetEncoding = [System.Text.Encoding]::GetEncoding(1252)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_ANSI1252" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
OEM850 nach EBCDIC500 (IBM Mainframes)
- Code: Alles auswählen
Function Convert_OEM850_EBCDIC500
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(850)
$targetEncoding = [System.Text.Encoding]::GetEncoding(500)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_EBCDIC500" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
Rückkonvertierung
Falls an den Unicodedateien Änderungen erfolgt sind, muss vor der Rückkonvertierung geprüft werden, ob alle Zeichen im Zeichenvorrat der Codepage vorhanden sind, da diese dort sonst nicht dargestellt werden können.
UTF-8 nach OEM850
- Code: Alles auswählen
Function Convert_UTF8_OEM850
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(65001)
$targetEncoding = [System.Text.Encoding]::GetEncoding(850)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_OEM850" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
# Normalization required if file contains different unicode encoding methods for special chars
$textfile = $textfile.Normalize([Text.NormalizationForm]::FormC)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
UTF-8 nach OEM852 (nur für zentral- oder osteuropäische NAV-Systeme außerhalb des Baltikums z.B. in Polen, Tschechien usw.)
- Code: Alles auswählen
Function Convert_UTF8_OEM852
{
# Code page 852 is for Central-/Eastern European NAV Systems (Poland, Czech Rep. etc. ) without Baltic States
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(65001)
$targetEncoding = [System.Text.Encoding]::GetEncoding(852)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_OEM852" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
# Normalization required if file contains different unicode encoding methods for special chars
$textfile = $textfile.Normalize([Text.NormalizationForm]::FormC)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
UTF-16 nach OEM850 (GetEncoding(1200) = Litte-Endian, für Big-Endian statt dessen GetEncoding(1201))
- Code: Alles auswählen
Function Convert_UTF16_OEM850
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(1200) # Little-Endian, for Big-Endian 1201
$targetEncoding = [System.Text.Encoding]::GetEncoding(850)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_OEM850" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
# Normalization required if file contains different unicode encoding methods for special chars
$textfile = $textfile.Normalize([Text.NormalizationForm]::FormC)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
ANSI1252 nach OEM850
- Code: Alles auswählen
Function Convert_ANSI1252__OEM850
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(1252)
$targetEncoding = [System.Text.Encoding]::GetEncoding(850)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_OEM850" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
UTF-8 nach ANSI1252
- Code: Alles auswählen
Function Convert_UTF8_ANSI1252
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(65001)
$targetEncoding = [System.Text.Encoding]::GetEncoding(1252)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_ANSI1252" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
# Normalization required if file contains different unicode encoding methods for special chars
$textfile = $textfile.Normalize([Text.NormalizationForm]::FormC)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
UTF-8 nach ISO 8859-1
- Code: Alles auswählen
Function Convert_UTF8_ISO88591
{
$args = resolve-path $args
$WorkingFolder = Split-Path -Parent $args
$sourceEncoding = [System.Text.Encoding]::GetEncoding(65001)
$targetEncoding = [System.Text.Encoding]::GetEncoding(28591)
$convertedFileName = [System.IO.Path]::GetFileNameWithoutExtension($args) +"_ISO8859_1" + [System.IO.Path]::GetExtension($args)
$convertedfile = New-Item -path "$WorkingFolder\$convertedFileName" -type file
$textfile = [System.IO.File]::ReadAllText($args, $sourceencoding)
# Normalization required if file contains different unicode encoding methods for special chars
$textfile = $textfile.Normalize([Text.NormalizationForm]::FormC)
[System.IO.File]::WriteAllText($convertedfile, $textfile, $targetencoding)
Write-host $args 'converted to' $convertedFileName
}
Tags: unicode code page file charset converter UTF16 UTF8 UTF-16 UTF-8 UCS UCS-2