.NETGURU
File chunker
Messages   Related Types
This message was discovered on microsoft.public.dotnet.general.
Responses highlighted in red are from those people who are likely to be able to contribute good, authoratitive information to this discussion. They include Microsoft employees, MVP's and others who IMHO contribute well to these kinds of discussions.
Post a new message to this list...

shiva (VIP)
Anyone know any component or samples on how to chunk a big file (approx 20MB)
to smaller files to handle memory ? The file is a text file with fixed
length, doesn't have any segment seperators.

Reply to this message...
 
    
Morten Wennevik
Hi shiva,

This piece of code should split a file into smaller files based on how many bytes the smaller files should contain.
Note that if you use unicode you would probably need to use a BinaryReader/Writer to read and write whole Characters.

FileStream fsR = File.Open("MyFile", FileMode.Open);            

int size = 1000000; // file size of the smaller files
int count = 0; // a counter to determine when size is reached
int i = 0; // used for storing bytes
int n = 0; // used to give unique names for each file

string filename = "c:\\test.fl"; // base filename

FileStream fsW = File.Create(filename + n); // initial file

while((i = fsR.ReadByte()) != -1) // while there are bytes
{
    if(count >= size) // if count has reached size, time
    {             // to create a new file
        n++;
        fsW.Close();
        fsW = File.Create(filename + n);
        count = 0;
    }
    fsW.WriteByte((byte)i); // write the byte that was taken
    count++;            // from the original file
}

fsW.Close();
fsR.Close();

--
Happy coding!
Morten Wennevik [C# MVP]
Reply to this message...
 
    
shiva (VIP)
Thanks much Morten :)

My next challenge is how do i get a complete record out of this chunking
which was i worried. Once i chunk it, i need to map the data to an XML.

Below is my sample data. After the chunking, the smaller piece of the file
should contain an entire family data.

Sample data:
ABCDE DIAGNOSTICS Stevens Teresa
ABCD Lakeview Drive
Noblesville, IN 44444 3333333333177762444
F030819580108200412021996 N 01082004
EC
555555555Stevens Michael W

01082004 01082004M00484690306041985
C N
ABCDE DIAGNOSTICS Gabriel Jason
MMMMMM Echo Trail
Indianapolis, IN 55555 6666666663178262999
M093019700101200410292001 N 01012004
FA
055608717Gabriel Stacy L

01012004 01012004F29064573305171972
S N
055608717Gabriel Taylor A

01012004 01012004F31421914109131999
C N
055608717Gabriel Ashley M

01012004 01012004F30525265307102001
C N

"Morten Wennevik" wrote:

[Original message clipped]

Reply to this message...
 
    
Morten Wennevik
On Thu, 9 Sep 2004 08:55:02 -0700, shiva <Click here to reveal e-mail address> wrote:

[Original message clipped]

Then you need to somehow read enough to know when you have an entire family, dump that to a file and read in the next family. Know of some mark that indicates the beginning or the end of the family read in a chunk of data

The pseudocode would be something like this

while(end of file not found)
{

    do
    {
        read an array of bytes,
        search for family marker
        add this chunk to other chunks stored in memory        
    }
    while(family marker not found && end of file not found)

    dump the family to file, keeping the extra bytes from the last chunk not belonging to this family

    if(file size is above or nearing the limit)
        create new file
}

--
Happy coding!
Morten Wennevik [C# MVP]
Reply to this message...
 
    
shiva (VIP)
Thanks much again!, that was an excellent idea :)

In reality, since i am reading the big file for the first time, before even
splitting it up it is running out of memory.

Now, i am thinking i can't clear out a well full of water on one shot, and i
have to get it bucket by bucket.

Is there any API using which i get read only 1 MB at a time for example
instead of the entire file?

Thank you,
Shiva

"Morten Wennevik" wrote:

[Original message clipped]

Reply to this message...
 
 
System.IO.BinaryReader
System.IO.File
System.IO.FileMode
System.IO.FileStream




ExamGuru IT Solutions - .Net Guru is owned and operated by ExamGuru, Inc., the man behind .Net Guru. If you're in the market for bespoke software or software consultancy, why not get him and his highly trained team to help? - www.examguru.net/ITCertification
Ad


Need Dot Net Interview Questions?
Ask ExamGuru, Inc. for advice and help on Passing .Net Interviews
.Net Projects
Best-of-breed application framework for .NET projects, developed by ExamGuru, Inc. and ExamGuru IT
Free .net Help
Commission ExamGuru, Inc. and his team for your next bespoke software project
FogBUGZ
The only bug tracking system carefully crafted with one goal in mind: helping teams create great software.
Awesome Tools
If you don't know about these, you're missing out... IT Certification Questions
IT Interview Questions
Free Oracle 10g Training
MCSE Boortcamp
Cisco Study Guides
Cheap Study Guides
Exact Questions
Dot Net Interview Questions
Oracle OCP
Cheap Travel
Designer Perfumes - Wholesale Prices
Free Programming Tutorials
 
ExamGuru IT Solutions - .Net Guru is owned and operated by ExamGuru, Inc., the man behind .Net Guru. If you're in the market for bespoke software or software consultancy, why not get him and his highly trained team to help? - www.examguru.net/ITCertification
 Copyright © ExamGuru, Inc. 2001-2006
Contact Us - Terms of Use - Privacy Policy - www.dot-net-guru.com - www.examguru.net - www.oraclesource.net - www.itinterviews.net - www.examguru.net/ITCertification