# VB - String comparison query



## sdhayes (Apr 14, 2012)

Can anyone help with the following query?

I have a series of txt files (called mem01.txt, mem02.txt etc) in which, amongst other data is a member id. This id will always be the second entry on each line in the relevant part of the file, but may be space or tab delimited - eg:

BLOGGSJ 4909981 SUBS
BLOGGSK 5592267 COACH FEE

When a new id is created, this is in a new file, of the same format. A new member file may have more than one new record.

I need a way to check the id field in all files against the id(s) in my new member file. If a new member id exists in any file, then error, but only for that id, not any others in that file.

I suspect I need some form of output of the existing id's to a file, then check the new ids against it, but I have had no joy trying to do this.

Any thoughts on the cleanest way to do this?


----------



## shuuhen (Sep 4, 2004)

For parsing the files, you'll probably want to look into regular expressions. The two approaches would be:

Use regular expression captures to directly pull the parts you care about out of the lines.
Use a split function/method to tokenize your lines. You should be able to find a decent split function or method in most languages. It could be part of a string object, a string processing library or a regular expression library. Some might only accept a character or static string, but a good split function will use regular expressions.

For the other part, check if your language has a "set" type. Similar to hashes/maps, it will store a key. The difference is it works well for when you don't need to associate a value with a key. A good set type should allow you to do set operations, like set union, set difference, etc. If you go with this approach, you might want to look for a simple introduction to set theory (you shouldn't need to learn much if you don't already know it).

I've never been a fan of VB or other .NET stuff, so I can't point you to VB libraries/documentation, but the functionality I've mentioned is common between languages.


----------



## sdhayes (Apr 14, 2012)

Thanks for the suggestion. I've used Split and have now succeeded in getting the member id's out to a file. Now I am reading my new member file and getting the id. When I then compare to see if this id already exists, it is not working correctly.

eg:
My member file (memid.txt) has the following rows:

1001
1002
1003

My new member file (a file ending .nmf) has it's member id field set to 1003. What's happening is that my current code (below) is reading line by line, and so telling me 1003 doesn't exist in the memid.txt file, until it reads line 3 of memid.txt and goes "it does exist". 
I think I need some form of 'ReadAllLines' function, but I can't figure out the syntax :banghead:

My current code, with notes: 

For Each newid As String In My.Computer.FileSystem.GetFiles(MbDir)
idfile = System.IO.Path.GetFileName(newid)
idfileInfo = New System.IO.FileInfo(idfile)
'Check for new member files
If LCase(idfileInfo.Extension) = ".nmf" Then
'New member file found, so get id field from member id line(newmemid(1) in this case)
Using idreader = New StreamReader(MBDir & newid)
Dim idrdline = idreader.ReadLine()
Dim newmemid() As String

While idrdline IsNot Nothing
If idrdline.Contains("_MEMID") Then
newmemid = Split(idrdline, vbTab)
'Now newmemid array created, check if the new member id exists in memid.txt
Using memidrd = New StreamReader(MBDir & "memid.txt")
Dim readmem = memidrd.ReadLine()
While readmem IsNot Nothing
If newmemid(1) = memidrdline Then
MsgBox("Member ID " & newxlicontents(1) & "is already in use.")
Close()
Else
MsgBox("new id" & newxlicontents(1))
'Streamwriter contents will be in here to write the new record.
End If
 memidrdline = xliidrd.ReadLine()

End While
End Using

Else
End If
newxlicontents = Nothing
idrdline = idreader.ReadLine()
End While


End Using
End If

Next

I have tried If newmemid(1) = memidrdline and If memidrdline = newmemid(1). I know this is because I am line by line checking. 

How can this be solved?


----------



## sdhayes (Apr 14, 2012)

Edit - have solved this with ReadAllLines.

Dim memText() As String = File.ReadAllLines(MBDir & "memid.txt")
If readText.Contains(nxlicontents(1)) Then....


----------



## sdhayes (Apr 14, 2012)

Ok - thought I had this sorted but still have a problem which I've been trying to solve for a couple of days without success.

Using Split I have managed to extract and compare my new member id field. This is working fine where the lines are tab separated. However, some are space delimited instead, and my routine is crashing with an out of array error for these.

I've read that Split cannot handle multiple delimiters, so instead thought I would try using Replace on the resultant combined text file to replace all spaces with tabs. This causes a crash. 

I do not have the option of editing the text files to manually replace space with tab (multiple users, multiple files).

Could anyone give me a syntax to work into the following code that will enable me to split the line by tab, and if the resultant array is empty, then split the line using space, and give me the id field, as the below code isn't working.
Also, because I don't know how many spaces may be in place, I need the code to return the member id (2nd field) in the same array position each time. 

Example lines in the files

MEM_TYPE 11223 Jbloggs
MEM_TYPE 12345 Fbloggs

Current code:
Using memreader = New StreamReader(MemDir & memfile)
Dim memLine = memreader.ReadLine()
Dim memcontents() As String
Dim memWriter
While memLine IsNot Nothing
If memLine.Contains("_TYPE") Then
memcontents = Split(memLine, vbTab)
'Check to see if the array has value in the id field. If it does, then this is a tab separated file, so write out as normal
If memcontents(1) <> "" Then
memWriter = New System.IO.StreamWriter(TempInstDir & "memid.txt", True)
memWriter.WriteLine(memcontents(1))
memWriter.Close()
Else
'This wasn't tab separated, so re-split using space instead
rememcontents = Split (memline, )
memWriter = New System.IO.StreamWriter(TempInstDir & "memid.txt", True)
memwriter.Writeline(rememcontents(1))
memwriter.Close()
End If
memcontents = Nothing
memLine = memreader.ReadLine()
End While
End Using


----------



## AlbertMC2 (Jul 15, 2010)

Hi

Maybe, as shuuhen said above, you should be looking at VB and regular expressions.
See these documents on creating and matching regular expressions using VB:
Regex Class (System.Text.RegularExpressions) 

Regular Expressions In .NET - Programmer's Heaven

How to match a pattern using regular expressions in Visual Basic .NET or in Visual Basic 2005


----------



## AlbertMC2 (Jul 15, 2010)

Hi

Sorry for the double post.
I have re-read your thread and just wanted to clarify:
-You are still trying to extract the Member IDs from the txt files?
-You just have to read the first line of the txt file to get the ID
-The ID is always the 2nd "word"?
-The ID either has a space before and after or a tab before and after?
-Could there be multiple tabs/spaces before and after the ID?
-The ID is always a number/integer?


----------



## AlbertMC2 (Jul 15, 2010)

Hi

Me again.
See the following code:

```
Imports System.IO
Imports System
Imports System.Text.RegularExpressions
'-----------------------------------------------------------------------------------
Public Class Form1
'-----------------------------------------------------------------------------------
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        Dim memreader = New StreamReader("c:\test.txt")
        Dim memLine = memreader.ReadLine()
        Dim memcontents() As String
        Dim memWriter = New System.IO.StreamWriter("c:\memid.txt", True)

        While memLine IsNot Nothing
            If memLine.Contains("_TYPE") Then
                memcontents = Split(memLine, vbTab)
                If memcontents.Length = 1 Then
                    memcontents = Split(memLine, " ")
                End If
                memWriter.Writeline(memcontents(1))
                memcontents = Nothing
                memLine = memreader.ReadLine()
            End If
        End While
        memWriter.Close()

    End Sub
'-----------------------------------------------------------------------------------
    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click

        Dim sr As New StreamReader("c:\test.txt")
        Dim memWriter = New System.IO.StreamWriter("c:\memid.txt", True)
        Dim input As String
        Dim pattern As String = "^\b(\w+)\s+\b(?<word>\w+)\s+"

        Do While sr.Peek() >= 0
            input = sr.ReadLine()
            Dim rgx As New Regex(pattern, RegexOptions.IgnoreCase)
            Dim matches As MatchCollection = rgx.Matches(input)
            For Each match As Match In matches
                Dim groups As GroupCollection = match.Groups
                memWriter.WriteLine(groups.Item("word").Value)
            Next
        Loop
        sr.Close()
        memWriter.Close()

    End Sub
'-----------------------------------------------------------------------------------
End Class
```
Both seem to work how you want it to.
Button1.click:
Sees if you are using tab or space. It reads in the line and splits it using tab delimited.
If it cannot split the line then *memcontents.Length = 1*, meaning the line is still the full line and has not been split and then proceeds to split using space delimited.
The biggest problem with this is if you have space and tabs on the same line.
The procedure also depens on the beginning of the line having a "_TYPE" string.

Button2.click:
Uses regular expression. It looks at:
-the start of the line (^), 
-the beginning of a word(\b) - word boundary,
-the end of a word (\w+), 
-any number of whitespace (\s+) - this includes tabs and spaces,
-the beginning of a word (\b) - word boundary,
-(?<word>\w+) puts the next word (your ID) into a "variable" called "word" untill the end of the word (\w+)
-Then more whitespace (\s+)
It does not matter if "_TYPE" is in the string and does not matter if you use any number of tabs or spaces to delimit your text files.


----------



## sdhayes (Apr 14, 2012)

Thanks for these posts. I had read up in Regex but honestly I couldn't get my head around it except for a really simple one I've used elsewhere in the program.

These should give me plenty to go on - thanks :smile:


----------



## sdhayes (Apr 14, 2012)

Albert - tried the second and it works exactly for what I need - thanks.

Is there a good online reference for regex examples, as I can see some other uses for it that would be helpful, if I can learn the syntax. Google brings up various pages, but looking for something for beginner dummies!

Thanks again.


----------



## AlbertMC2 (Jul 15, 2010)

Try:
RegExLib.com Regular Expression Cheat Sheet (.NET Framework)

Character Escapes

Character Classes

For regular expressions I always test it out in Microsoft Word. You just use the Find feature (Using option for wildcards). 
There are a few differences eg the (?<word>\w+) reference 
But most of the features are the same.


----------



## AlbertMC2 (Jul 15, 2010)

sdhayes via PM said:


> Hiya,
> 
> Thanks for the help with Regex the other week. Could you confirm if the following should return the 3rd word in the line:
> 
> ...


You can try:

```
"^\b\w+\s+\b\w+\s+(?<word>\w+)"
```


----------



## AceInfinity (Jan 21, 2012)

You could also do something like this:

```
Array.ForEach(Of String)(File.ReadAllLines("D:\file.txt"), Sub(ln) Console.WriteLine(ln.Split(New Char() {Chr(9), " "c}, StringSplitOptions.RemoveEmptyEntries)(1)))
```
For example, to get the second value regardless of space or tab (the id). Comparison is otherwise EASY.


----------

