October 16th, 2007, 7:00 pm
QuoteOriginally posted by: kjeldYour approach sounds like a good one; with not too much parsing I would expect at least 10 MB/s of throughput achievable (so 100 seconds for the whole file)The vector with tickers should in principle be no problem regarding memory. However, what is the intention of that vector? Do you use it on each new line to check if it contains a new ticker?If you like I could take a look at the code; no need to send the 1 GB over since we have similar files around. -Kjeldwow, 100 sec? I'm getting 20 min here... Am I doing something wrong ? (I'll show the code down).The vector is just for saving the unique tickers and test uniqueness for each iteration. Yes, it is used for each line, but it grows according to the entry of new unique tickers.Here's the code for getting the size (number of row and columns) of the file. The unique tickers function is a little more complex, but the structure is very similar to this one. Btw, I'm using MV express.int bigTextFile::getSize(){ ifstream ifs(fileName); if (!ifs) { cout<<"Something is wrong with "<<fileName<<endl; system("PAUSE");terminate(); } else { cout<<"File loaded sucessfully!"<<endl; } cout<<"Reading "<<fileName<<" contents. Please Hold.."<<endl<<endl; int countnr=0; //number of row (lines) int countnc=1; // number of collumns while (!ifs.eof()) { string line; getline(ifs,line); if (countnr==0) // for the first iter, get number of columns { int len=line.length(); char *myptr2=new char[len]; const char *linTok; line.copy(myptr2,len,0); myptr2[len]=0; linTok=strtok(myptr2,";"); if (linTok==NULL) break; while (linTok!=NULL) { linTok=strtok(NULL,";"); if (linTok==NULL) { break; } countnc++; //count the number of columns } } countnr++; //count the number of rows } nrow=countnr; ncol=countnc; ifs.close(); return 1;}
Last edited by
msperlin on October 15th, 2007, 10:00 pm, edited 1 time in total.