Friday, September 9, 2016

UTL_FILE and newline character



Two different objections:
1. 1. "The answer using CHR(10) is not correct as CHR(10) is not the platform-specific line terminator(s) on Windows and some other OSs".
2. 2. "The answers that put a newline after "abc" are not valid answers according to your specification."
We'll answer each objection in turn.

CHR(10)
We can certainly understand the response we got, since we felt the same way when it was initially brought to our attention. At that point, however, we took a closer look and discovered some very interesting behavior by Oracle and UTL_FILE regarding the handling of line terminators. 
It is correct that CHR(10) (LineFeed or LF) is not the platform-specific line terminator(s) on Windows (that would be CHR(13) || CHR(10) -- CarriageReturn + LineFeed or CF + LF) but the file is opened in text mode (default) so Oracle translates any LF to the platform-specific line terminator(s) on the platform of the host. So if it runs on Windows, LF is translated to CR + LF, if it runs on Linux, LF stays as LF.

This can be verified on Windows through the following code sample (assuming that the directory TEMP points to c:\temp):
host del c:\temp\quiz.txt
DECLARE
l_file UTL_FILE.file_type;
BEGIN
l_file := UTL_FILE.FOPEN('TEMP', 'quiz.txt', 'W');
UTL_FILE.PUT (l_file, 'First' || chr(10));
UTL_FILE.PUT (l_file, 'Second' || chr(13) || chr(10));
UTL_FILE.PUT (l_file, 'Third' || chr(10) || chr(13));
UTL_FILE.PUT_LINE (l_file, 'Fourth');
UTL_FILE.PUTF (l_file, 'Fifth\n');
UTL_FILE.FCLOSE (l_file);
END;
/
host debug c:\temp\quiz.txt
If you're on Windows Vista or newer, debug is no longer part of your Windows version. ). To get a hex dump of a file under Windows one can use a free utility called HexDump32:

This will demonstrate that the contents of the file is actually (OD = CR, 0A = LF):
00000000 46 69 72 73 74 0D 0A 53-65 63 6F 6E 64 0D 0D 0A First..Second...
00000010 54 68 69 72 64 0D 0A 0D-46 6F 75 72 74 68 0D 0A Third...Fourth..
00000020 46 69 66 74 68 0D 0A - Fifth..
So when a file is opened in text mode, UTL_FILE does this:
1. First: CHR(10) is written as OD OA, CHR(13) + CHR(10).
2. Second: CHR(13) || CHR(10) is written as OD OD OA, CHR(13) + CHR(13) + CHR(10).
3. Third: CHR(10) || CHR(13) is written as OD OA OD, CHR(13) + CHR(10) + CHR(13).
4. Fourth: PUT_LINE correctly puts OD OA, CHR(13) + CHR(10).
5. Fifth: PUTF correctly puts \n as OD OA, CHR(13) + CHR(10).
On Linux, you can similarly verify the behavior (assuming that the directory TEMP points to /tmp):
host rm /tmp/quiz.txt
DECLARE
l_file UTL_FILE.file_type;
BEGIN
l_file := UTL_FILE.FOPEN('TEMP', 'quiz.txt', 'W');
UTL_FILE.PUT (l_file, 'First' || chr(10));
UTL_FILE.PUT (l_file, 'Second' || chr(13) || chr(10));
UTL_FILE.PUT (l_file, 'Third' || chr(10) || chr(13));
UTL_FILE.PUT_LINE (l_file, 'Fourth');
UTL_FILE.PUTF (l_file, 'Fifth\n');
UTL_FILE.FCLOSE (l_file);
END;
/
host hexdump –C /tmp/quiz.txt
Why does Oracle do this? Probably because UTL_FILE is implemented in C and this is how it's done in C's functions fopen and fprintf when a file is opened in text mode. Admittedly, it's not well described in the Oracle documentation.
You can also open files with UTL_FILE in byte mode but in this case you can only use procedures PUT_RAW and GET_RAW. If you use any of the other subprograms UTL_FILE will raise an UTL_FILE.INVALID_OPERATION exception. With byte mode, Oracle doesn't do any translation of newline characters, which is demonstrated in the following (to be run under Windows):
host del c:\temp\quiz.bin
DECLARE
l_file UTL_FILE.file_type;
BEGIN
l_file := UTL_FILE.FOPEN('TEMP', 'quiz.bin', 'wb');
UTL_FILE.PUT_RAW (l_file, utl_raw.cast_to_raw('First' || chr(10)));
UTL_FILE.PUT_RAW (l_file, utl_raw.cast_to_raw('Second' || chr(13) || chr(10)));
UTL_FILE.PUT_RAW (l_file, utl_raw.cast_to_raw('Third' || chr(10) || chr(13)));
UTL_FILE.FCLOSE (l_file);
END;
/
host debug c:\temp\quiz.bin
This demonstrates the following contents:
00000000 46 69 72 73 74 0A 53 65-63 6F 6E 64 0D 0A 54 68 First.Second..Th
00000010 69 72 64 0A 0D - ird..

Newline after "abc"
We argue that the line terminator (= newline) is part of the line (like an End Of File is part of a file so is End Of Line part of a line) so as long as "abc" is put on a separate line (with optional line terminator), coming after 2 empty lines (with line terminator) the code meets the specification.



No comments:

Post a Comment