My solution is right in the JavaDoc. I take in a bunch of bytes, but never specify what format their in. Java makes an assumption of what format they're in depending on the platform.
Even worse, I initialize the SimpleDoc(rawCmds.getBytes(), docFlavor, docAttr); without specifying what format the Bytes were in. This needed to change to new SimpleDoc(rawCmds..getBytes(charset), docFlavor, docAttr);
This article lead me to it...
My solution will be simply to use the String function that accepts the "CharSet" option and force US-ASCII and convert bytes using the same charset variable.
EDIT: force "Cp1252"
I was asked the following question via email:
ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ RINCIAN KARAKTER ANEH YANG SEHARUSNYA TIDAK ANEH ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ÕÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍ¸ ³ NO. ³ KODE ASCII ³ TAMPILAN ³ KETERANGAN ³ ÃÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 1 ³ 179 ³ ³ ³ BISA KOQ ³ ³ 2 ³ 218 ³ Ú ³ BISA KOQ ³ ³ 3 ³ 191 ³ ¿ ³ BISA KOQ ³ ³ 4 ³ 194 ³ Â ³ BISA KOQ ³ ³ 5 ³ 195 ³ Ã ³ BISA KOQ ³ ³ 6 ³ 180 ³ ´ ³ BISA KOQ ³ ³ 7 ³ 197 ³ Å ³ BISA KOQ ³ ³ 8 ³ 192 ³ À ³ BISA KOQ ³ ³ 9 ³ 217 ³ Ù ³ BISA KOQ ³ ³ 10 ³ 193 ³ Á ³ BISA KOQ ³ ÃÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ K E S I M P U L A N ³ M U L U S ³ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÙ
Here's my long winded answer:
As far as Java's default encoding is concerned, it's platform dependent. Conservatively, one would need to specify -Dfile.encoding=Cp1252at the command line, but of course that is not available via the web browser.
The best recommendation I could find was from Edward Grech (taken from JVM™ Tool Interface) where he recommends creating an ENVIRONMENT VARIABLE called "JAVA_TOOL_OPTIONS" and set it to "-Dfile.encoding=Cp1252", which the JVM should pick up each time it is started.
Last but not least, you can try to use unicode values directly from the CP1252 chart instead of allowing converting it in the browser.
Some Additional Information (copied from their respective sites):
Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages.
It is very common to mislabel Windows-1252 text data with the charset label ISO-8859-1. Many web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content. However, the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.
LINUX + CMAP
The character 0xA1 in cp437 is an accented vowel which is not correct for this code in latin1. So cmap is informing the console
driver to react as if the character request were for 0xAD. The console driver goes into the unimap (straight-to-font) and reads the unicode at position 0xAD. This happens to be U+00a1, the inverted exclamation mark. Next stop is the font where the glyph for U+00a1 has to be picked up. In the end, we had a request for 0xA1 but we did not get the character at that position in cp437, we got the inverted exclamation mark for the position 0xA1 in latin1. Our cp437 is behaving like a latin1 font thank to the cmap.
JAVA DEFAULT ENCODING
Since the command-line cannot always be accessed or modified, for example in embedded VMs or simply VMs launched deep within scripts, a JAVA_TOOL_OPTIONS variable is provided so that agents may be launched in these cases.
By setting the (Windows) environment variable JAVA_TOOL_OPTIONS to -Dfile.encoding=UTF8, the (Java) System property will be set automatically every time a JVM is started. You will know that the parameter has been picked up because the following message will be posted to System.err: Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8