HOW TO: Spilt Text File After Specified Line

Hi All,

How can I split a big text file (e.g. 4525 lines of data) into pieces of smaller text file after a specified number of lines? (e.g. it will gives us 10 smaller file if the smaller file has maximum 500 lines of data; the last piece will have only remainder 25 lines of data.)

Thanks in advance.

[305 byte] By [LeonardLee] at [2007-12-25]
# 1

Hi Leonard,

I think the following piece of code does what you want (tested):

using (System.IO.StreamReader sr = new System.IO.StreamReader(@"c:\test.txt")) {

int lineCount = 0;

int fileCount = 0;

System.IO.StreamWriter sw = null;

string line = null;

do {

//every 500th line, create a new Streamwrite

if (lineCount % 500 == 0) {

//close the previous streamwrite (if any: the first time there won't be one)

if (sw != null) {

sw.Close();

}

//increase the filecount & create a new file

fileCount++;

sw = new System.IO.StreamWriter(@"c:\test_" + fileCount.ToString() + ".txt");

}

//read the line & write it to the new file

line = sr.ReadLine();

sw.WriteLine(line);

//increase the line count

lineCount++;

}

//do this until there are no more lines

while (line != null);

//close the last streamwriter if any)

if (sw != null) {

sw.Close();

sw.Dispose();

}

}

Hope this helps,

SvenDeBont at 2007-9-3 > top of Msdn Tech,Visual C#,Visual C# Language...
# 2

Thanks. I was very grateful of your help here.

Another few questions here.

1. What if I would like to split the file by its smaller pieces file size? (e.g. 2MB for each pieces.)

2. How do I save the first 5 lines from the original file into the first 5 lines of the smaller peices of files?

Thanks in advance.

LeonardLee at 2007-9-3 > top of Msdn Tech,Visual C#,Visual C# Language...
# 3

In text files, every character is one byte, and a new line contains 2 character (linefeed and carriage return). So it's not so hard to keep track of the size instead of counting the number of lines.

I assume you mean you want to use the first 5 lines of the source file as 'header' files for each of the new files. Also, this isn't very hard to implement.

I modified the example I made before to implement these changes:

using (System.IO.StreamReader sr = new System.IO.StreamReader(@"c:\test.txt")) {

//create an array for the first 5 lines

string[] header = new string[5];

int lineCount = 0;

string line = null;

//read the first 5 lines

do {

line= sr.ReadLine();

header[lineCount] = line;

lineCount++;

}

while (lineCount < 5);

int byteCount = 0;

int fileCount = 0;

System.IO.StreamWriter sw = null;

do {

//every 2MB , create a new Streamwriter

if (byteCount == 0 || byteCount > 2097152) {

//reset the byteCount

byteCount = 0;

//close the previous streamwriter (if any: the first time there won't be one)

if (sw != null) {

sw.Close();

}

//increase the filecount & create a new file

fileCount++;

sw = new System.IO.StreamWriter(@"c:\test_" + fileCount.ToString() + ".txt");

//write the 5 header line to each file

foreach (string h in header) {

sw.WriteLine(h);

byteCount += (h.Length + 2);

}

}

//read the line & write it to the new file

line = sr.ReadLine();

if (line != null) {

sw.WriteLine(line);

//increase the byte count

//every char is 1 byte in notepad

//add 2 bytes for the newline

byteCount += (line.Length + 2);

}

}

//do this until there are no more lines

while (line != null);

//close the last streamwriter if any)

if (sw != null) {

sw.Close();

sw.Dispose();

}

}

The example above contains serveral hard code values that are better being replaced by contstants or propertier or so. The code also contains no error handling and will fail if the source file doesn't contain at least 5 lines.

Hope this helps,

SvenDeBont at 2007-9-3 > top of Msdn Tech,Visual C#,Visual C# Language...
# 4
Thanks. I had to understand your code later.

Sorry to disturb you for this again. Could you can separate the File Header and the File Size codes if you had the time. If not, it is also fine to me.

Thanks you for your hard effort in helping me.

LeonardLee at 2007-9-3 > top of Msdn Tech,Visual C#,Visual C# Language...
# 5

Leonard,

In the sample below, I put the code to retrieve the 'fileheaders' in a seperate method. The method takes 2 arguments: the filename to retrieve the headers from, and the number of lines to return. The method to parse the file (ParseFile) uses this method to get the headers before parsing the file.

Remember that if the file that contains the headers is the same file as the one that needs to be parsed, you'll need to skip the first n lines when you start parsing (where n = the number of header lines you retrieve)

/// <summary>

/// Gets the first n lines out a file (headers)

/// </summary>

/// <param name="fileName">The filename to get the headers from</param>

/// <param name="lines">The number of lines to get</param>

/// <returns>A string array with the first n lines of a file</returns>

private string[] GetHeader(string fileName, int lines) {

string[] header = new string[lines];

using (System.IO.StreamReader sr = new System.IO.StreamReader(fileName)) {

int lineCount = 0;

string line = null;

//read the first n lines

do {

line = sr.ReadLine();

header[lineCount] = line;

lineCount++;

}

while (lineCount < lines);

}

return header;

}

private void ParseFile() {

//get the fileheaders

string[] header = GetHeader(@"c:\test.txt", 5);

//open the file

using (System.IO.StreamReader sr = new System.IO.StreamReader(@"c:\test.txt")) {

int byteCount = 0;

int fileCount = 0;

int lineCount = 0;

string line = null;

System.IO.StreamWriter sw = null;

do {

//Remember to skip the first n lines when

//the header lines come from the same file as the data

while (lineCount < 5) {

lineCount++;

sr.ReadLine();

}

//every 2MB , create a new Streamwriter

if (byteCount == 0 || byteCount > 2097152) {

//reset the byteCount

byteCount = 0;

//close the previous streamwrite (if any: the first time there won't be one)

if (sw != null) {

sw.Close();

}

//increase the filecount & create a new file

fileCount++;

sw = new System.IO.StreamWriter(@"c:\test_" + fileCount.ToString() + ".txt");

//write the 5 header line to each file

foreach (string h in header) {

sw.WriteLine(h);

byteCount += (h.Length + 2);

}

}

//read the line & write it to the new file

line = sr.ReadLine();

if (line != null) {

sw.WriteLine(line);

//increase the byte count

//every char is 1 byte in notepad

//add 2 bytes for the newline

byteCount += (line.Length + 2);

}

}

//do this until there are no more lines

while (line != null);

//close the last streamwriter if any)

if (sw != null) {

sw.Close();

sw.Dispose();

}

}

}

Hope this helps,

SvenDeBont at 2007-9-3 > top of Msdn Tech,Visual C#,Visual C# Language...