Welcome to Dream.In.Code
Getting C# Help is Easy!

Join 107,422 C# Programmers for FREE! Ask your question and get quick answers from experts. There are 1,194 online right now! We've got more than 500 tutorials and 2,000 snippets. Join and find out why Dream.In.Code is the #1 programming help community on the internet! Registration is fast and FREE... Join Now!



A CSV Story

 
Reply to this topicStart new topic

A CSV Story, CSV Valdation

rwaldron
post 31 Mar, 2008 - 03:20 AM
Post #1


New D.I.C Head

*
Joined: 31 Mar, 2008
Posts: 6

Hiya, I found this brilliant article - Intro to A Regular Expression, a CSV Story
http://www.dreamincode.net/forums/blog/mar...p?showentry=587

By Martyr2

I hope Martyr2 Might read this too !!!!

And was hoping that you could help me.

I am looking to write some Code in C# or VB that will do the following.

Check a number of CSV files in a particluar folder (Say c:\test) which could be of any name so *.csv

Their are 2 major checks that I am looking todo.

(1) That there are a total of 14 fields per line in the CSV.

(2) That each field cannot contain data longer than a certain amount.

EG:

A real example of what my csv files contain is below ( 2 lines )

0067,Medium,7NOK2155,,,CHEMDRY INFO,9 The Avenue,Riverside Manor,DrumCare,Co Kildare,028BS007VTTT3000,Donnchadh,McGinn,042431400
0067,Medium,7NOK2153,,,DPS ACC,5 The Dales,Riverside Manor,Kilcullen,Co Ofally,028BSsddLEIR3000,Liam,grant,045481422


Max lengths are
field1 maxlength9
field2 maxlength10
field3 maxlength15
field4 maxlength19
field5 maxlength10
field6 maxlength36
field7 maxlength36
field8 maxlength36
field9 maxlength36
field10 maxlength36
field11 maxlength16
field12 maxlength36
field13 maxlength36
field14 maxlength16

A regular expression that I have come up with for field lengths is below But I think I can Tighten this up later.
CODE
Dim matchpattern as String = "^[^,]{0,9},[^,]{0,10},[^,]{0,15},[^,]{0,19},[^,]{0,10},(?:[^,]{0,36},){5}[^,]{0,16},(?:[^,]{0,36},){2}[^,]{0,16}(?:\r\n|$)"



What I need help with is the general process of the application..

ie: Check all files in c:\test
If a csv File matches regular expression for Amount of Fields ( 14 ) and no Field exceeds its limit then Save file to c:\test\good
IF a CSV File Fails to match the expression then save to c:\test\bad.
Remove csv file from c:\test

Your example (c#) is very close to what I need.
Could you help me with the overcall code ?

Thx,
Ray..

User is offlineProfile CardPM

Go to the top of the page


Martyr2
post 31 Mar, 2008 - 10:20 AM
Post #2


Programming Theoretician

Group Icon
Joined: 18 Apr, 2007
Posts: 4,270



Thanked 72 times

Expert In: C/C++, Java, VB, VB.NET, C#, PHP, Web Development, HTML & CSS, Javascript

My Contributions


Hello rwaldron,

I am glad you liked my blog article on regular expressions. I am always flattered when I can help someone out through the blog.

As for your problem, you are going to love me I am sure. I have taken my example from the blog and modified it to meet your criteria. Some of the changes I have made are...

1.) Made my main function loop through files of the given directory looking for csv files (c:\Test)
2.) Changed my class from using Run to now using TestFile and it returns a boolean (true/false) as to whether or not it passed our test.
3.) Used the return value from our new function to then copy the files to the given folder based on the result. If it returns true, it is a good file. If it returns false it is a bad file.

While I have brought you 99% of the way there, I always love to leave 1% as homework for the user. You will notice I did not change the pattern or the field count. I will leave that up to you. I have also made the copy function to copy the files, but did not remove them from the directory. I am sure you can figure that part out yourself too. You can keep my copy in place or choose to use a function like "move()". Up to you.

So here is the code fully documented as usual...

csharp

using System;
using System.IO;

// Notice the namespace for using regular expressions. .NET has an entire namespace dedicated to the topic.
using System.Text.RegularExpressions;


namespace experimentalconsole
{
class Program
{
static void Main(string[] args)
{
// Create an instance of our class and run it
checkCSV csv = new checkCSV();

// Get the files in the given directory matching the csv extension
String[] Filenames = Directory.GetFiles("c:\\Test","*.csv");

// Loop through all the files returned from that directory
foreach (String filename in Filenames)
{
// Check to make sure each exist
if (File.Exists(filename))
{
// Run it through our test
bool valid = csv.TestFile(filename);

// Pull off the filename from the path
String basename = Path.GetFileName(filename);

// If the file passed testing, write to screen it was valid
// Then copy it to the good folder under c:\\Test
if (valid)
{
Console.WriteLine(filename + " is valid");

File.Copy(filename, "c:\\Test\\good\\" + basename);
}
else
{
// Failed test, move it to the bad folder under c:\\Test
Console.WriteLine(filename + " is invalid");
File.Copy(filename, "c:\\Test\\bad\\" + basename);
}
}
}
}

}

class checkCSV
{
// Lets setup our pattern first
private static string csvRegexPattern = @"([^,""]+|""([^""]|"""")*""|,,)";

// Lets now setup a Regex object using the pattern
// This creates a Regex object using the pattern passed as a parameter to the constructor
private static Regex _Regex = new Regex(csvRegexPattern);

// This method replaced Run in the previous example and now returns a true or false
// to determine if passed our tests
public bool TestFile(string AFile)
{
String lineRead;
int lineNumber = 0;

try
{
StreamReader sr = new StreamReader(AFile);

// Loop through the file, reading each line
while (null != (lineRead = sr.ReadLine()))
{
// Increment the line number
lineNumber++;

if (_Regex.Matches(lineRead).Count != 5)
{
// Return false because it didn't match our field count of 5
// Change this to 14 and use your pattern instead of mine.
return false;
}
}
}
catch (Exception e)
{
// If there was an error, lets see what it was, return false.
Console.WriteLine("File could not be found: {0}", e.Message.ToString());
return false;
}

// Must have passed our test, return true.
return true;
}
}
}


Read through the in code comments to see what has been changed and what you need to do where. After you make your few changes, you will have something working within about ohhhh 15 minutes.

Enjoy and thanks for reading my blog! smile.gif

"At DIC we be regular expression blogging code ninjas!" decap.gif
User is offlineProfile CardPM

Go to the top of the page

rwaldron
post 1 Apr, 2008 - 04:04 AM
Post #3


New D.I.C Head

*
Joined: 31 Mar, 2008
Posts: 6

Thank you so much Marty for you help.
I will review the code and Test over the next few days.
I will then let you know how I get on..
I take it that this is the best place to communicate rather than email.
I will be in touch....Love is such a strong word but I really do appretiate the help.....!!!!!!
Ray..(Ireland)
User is offlineProfile CardPM

Go to the top of the page

rwaldron
post 7 Apr, 2008 - 03:41 AM
Post #4


New D.I.C Head

*
Joined: 31 Mar, 2008
Posts: 6

Hiya Marty,
I have been testing away with the code..
I have a regular expression that passes my csv when I test it using a regular expression designer..
The full expression is at the very end of this reply

But When I run your code I constantly get file is invalid...
So I stripped back my file to just one field and tested - Passed.
Then using 2 fields I tested - Failed ..

ie:

CSV File just Contains
0067
private static string csvRegexPattern = @"(^[^,]{0,9})";
if (_Regex.Matches(lineRead).Count != 1)
NoError ------> Valid


CSV File just contains
0067,medium
private static string csvRegexPattern = @"(^[^,]{0,9},[^,]{0,10})";
if (_Regex.Matches(lineRead).Count != 2)
Error ---> Invalid ??

Also if CSV File just Contains
0067123456789
private static string csvRegexPattern = @"(^[^,]{0,9})";
if (_Regex.Matches(lineRead).Count != 1)
NoError ------> Valid But this should fail because it is longer than 9 characters...

Anymore help please ?
Ray..

P.S The full expression that matches in my regex designer is
@"(^[^,]{0,9},[^,]{0,10},[^,]{0,15},[^,]{0,19},[^,]{0,10},(?:[^,]{0,36},){5}[^,]{0,16},(?:[^,]{0,36},){2}[^,\r\n]{0,16})";



User is offlineProfile CardPM

Go to the top of the page

rwaldron
post 14 Apr, 2008 - 03:51 AM
Post #5


New D.I.C Head

*
Joined: 31 Mar, 2008
Posts: 6

Hiya,
If I leave the count set to 1 then everything works fine


CODE
if (_Regex.Matches(lineRead).Count != 1)  
                    {  
                        // Return false because it didn't match our field count of 5  
                        // Change this to 14 and use your pattern instead of mine.  


This doesn't really work.
If I leave this at 1 then the checker reports valid invalid correctly
even on files that contain multiple lines.
There was no real need for this area to count number of fields
because if each line doesn't match the expression exactly then it will fail.
Therefore there was no need to count the fields.. ??

What do ya think,
Ray..
User is offlineProfile CardPM

Go to the top of the page

plbro
post 1 Jul, 2008 - 11:43 PM
Post #6


New D.I.C Head

*
Joined: 1 Jul, 2008
Posts: 1

hai friends

I had a question regarding csv file reading...

Actually i am retrieving my contact address list from yahoo mail, which give response in csv format

The problem is, when we add address in yahoo with "new line feed"(or enter key press), i get the csv contains \r\n within the quotes. for ex:

Name",".","L","pnl","plbro@gmail.com","Friend", ...[more info].... "Houseno, Streetname","city","State","pin","India","DOB","","","","","","some notes","","","","","","","","","","","","","","","",""

It must comes in single line but line breaks after houseno(in ex)which csv consists

"houseno \r\n Streetname" and finally \r\n is also in end of the line

I cant directly remove \r\n, b/c i read the csv Line by line ... so how to remove \r\n which are only in double quotes like address fields

how can i get rid of this? is it any regular expressions to parse this?

thnx in Advance
User is offlineProfile CardPM

Go to the top of the page

Fast ReplyReply to this topicStart new topic
Time is now: 8/28/08 08:35PM

Live C# Help!

C# Tutorials

Reference Sheets

C# Snippets

Bye Bye Ads

Free DIC T-Shirt

T-Shirt Example

Related Sites

Monthly Drawing

Thumb Drive

Partners

Top Contributors

Top 10 Kudos This Month