Using Phonetic Matching To Move Excel Data Into A Visual FoxPro Database
Using Phonetic Matching To Move Excel Data Into A Visual FoxPro Database
Abstract
String and Name matching applications can provide powerful capabilities for
easily identifying records belonging to specific individuals. This can prove
very useful in moving legacy data such as in Microsoft Excel Spreadsheets
into an existing database. In Visual FoxPro, you could use either the
SOUNDEX() or the DIFFERENCE function for this purpose. This article
demonstrates a brief application of the DIFFERENCE command and its use in
matching names to retrieve account IDs from a Visual FoxPro Table. It also
demonstrates how data previously stored in an Excel spreadsheet is moved
into tables in a Visual FoxPro Database.
Introduction
Many situations can occur in real-life applications that will require string
matching. For examples, you may know the name of a person but not the
unique Account ID of the person. Making an exact match of the first name or
any other names for that matter may not work especially in countries where
there is no standard name spelling! For example, in Ethiopia, the same name
could be spelt in three different ways. Take the following examples:
In this situation, you can use string matching to find out if names are
phonetically similar so that you could then use decide is you had found the
right person. Using Visual FoxPro’s DIFFRERENCE command, you can
determine if two words sound the same. This function powerfully rescued us
in just one such situation with consistently 98% of above accuracy!
This meant that the only way to match a mark entry on the spreadsheet to
the correct student record in the system is by matching the names. It was not
practical to make the teachers update the spreadsheet because of the sheer
number of homerooms and number of students involved. Moreover, because
most of the teachers knew Ms Excel and had already prepared their roasters
using Excel, this presented the most effective way to get the data directly
from the Excel spreadsheet into FoxPro database.
SET EXACT ON
DIMENSION arrSubjs(22,1)
STORE 0 TO
intRows,intCols,intCnt,intStudsInClass,intTerm,intAge,intRecNumb,nScore
,nExamTotal,nExamAvg,intRankInClass,intDays
STORE 0 TO intFirstRank,intSecRank,nParamVal1,nTotPossMarks
STORE "" TO
cFile,cMsg,cCurrName,cSex,cBatchCode,cRC,cDropOut,cPromoted,cDetained,c
Behaviour,cFirstName,cMiddleName,cStudName
STORE "" TO
cFirstName,cLastName,cRFirstName,cRMiddleName,cBatchCode,cParamVal2,cSt
udCode
These variables are used to hold data obtained from the Excel
spreadsheet so that these can be evaluated prior to conversion. The
variable oEX is used to hold the reference to the Microsoft Excel object,
cFileis used to obtain the name, path and location of the Microsoft
Excel file to be processed. The Variable intRows represents the total
number of rows in the excel spreadsheet while intCols represents the
total number of columns to be processed. Another most important
variable here is the array variable arrSubjs declared with the
DIMENSION command. This will be used to hold subject header detail
read back from the spread sheet. This will then be later appended from
to the cursor TSubjs using the APPEND FROM ARRAY command. Of
course, you will naturally have to create an interface so that users can
enter the academic year to be processed and the grace and class to be
processed. Our interface looks something like fig 1.
Fig 1: the End of Term Report form that users may use to enter the data
they want. The Grade to be processed is entered in the starting grade
(spnStartGrade) field. The specific section in the Class field (txtClass).
The academic year is entered in the Academic Year fields
(txtAcdStartYear and txtAcdEndYear) and the user simply clicks the
In this case, the GETFILE() function displays the open file dialog box so
that the user can select an Excel file. The GetObject() function opens
the file and creates an Excel automation object from it. The reference
to the object is stored to the object variables oEX! Now that we have
an object variable to this line of code, we can set and obtain property
values for the object just as we would with any other object.
3. Step 3: We need to determine the subjects that will be processed. This
is important because different school years will take different subjects.
The fifth row on the spreadsheet as shown contains the list of subjects.
Since Students can be graded only for subjects that have already been
defined in the schools management system database, we will need to
read these back from the spreadsheet and then obtains the correct
subject IDs from the Subjects table within the application system. Also,
we will check the teacher’s timetable defined in the system to enable
us obtain the name and ID of the teacher taking the subject for the
defined school year. In the piece of code shown below, a FOR…
ENDFOR loop is the principal and fastest loop that could have been
used to iterate the cells of the Excel Spreadsheet. In each iteration
through the loop, the array, arrSubjs is updated with the value returned
from the Excel Spreadsheet. You can create a temporary table to hold
the list of subjects by entering the following code:
* Now Determine the subjects you will be processing and then read
* them from the Excel Spreadsheet into the Array and then create a
cursor with them
FOR intCols = 7 TO 28 && These columns contain Column Numbers
and Dates
cMsg = oEX.ActiveSheet.Cells(4,intCols).Value
IF RTRIM(UPPER(cMsg)) <> "T REMARK"
*intCnt = intCnt = 1
*DIMENSION arrSubjs(intCnt,1)
arrSubjs(intCols - 6 + 1,1) = cMsg
*arrSubjs(intCnt,1) = cMsg
ENDIF
ENDFOR
* Create a cursor and put the subjects you found into them to make it
easier to process
CREATE CURSOR TSubjs (CourseCode c(20),TotPossMarks N(3,2),Curricular
L,CourseName c(50),EmployeeCode c(15),EmployeeName c(50))
SELECT TSubjs
APPEND FROM ARRAY arrSubjs
USE sYSUBJECTS IN 0
USE AdmTeachSubj IN 0
SELECT TSubjs
GO TOP
SCAN
SELECT SySubjects
GO TOP
LOCATE FOR ALLTRIM(UPPER(SySubjects.CourseName)) =
ALLTRIM(UPPER(TSubjs.CourseCode))
IF FOUND()
REPLACE TSubjs.CourseCode WITH SySubjects.CourseCode
REPLACE TSubjs.CourseName WITH Sysubjects.CourseName
REPLACE TSubjs.TotPossMarks WITH SySubjects.TotPossMarks
REPLACE TSubjs.Curricular WITH SySubjects.Curricular
ENDIF
SELECT AdmTeachSubj
GO TOP
LOCATE FOR ALLTRIM(UPPER(AdmTeachSubj.CourseCode)) =
ALLTRIM(UPPER(TSubjs.CourseCode)) ;
AND AdmTeachSubj.Grade = THISFORM.spnStartGrade.Value ;
AND AdmTeachSubj.Class = THISFORM.txtClass.Value ;
AND AdmTeachSubj.CurrYear = THISFORM.txtAcdStartYear.Value AND
AdmTeachSubj.NextYear = THISFORM.txtAcdEndYear.Value
IF NOT FOUND()
THISFORM.lblstatus.Caption = "*** Stop Error-Subject: " +
RTRIM(TSubjs.CourseCode) + " has not been defined for Grade: ";
+ ALLTRIM(STR(THISFORM.spnStartGrade.Value)) +
THISFORM.txtclass.Value
*strMsg = "This subject has not been defined for this
class!"
*MESSAGEBOX(strMsg,MBINFO,chrProgTitle)
*RETURN
ELSE
REPLACE TSubjs.EmployeeCode WITH AdmTeachSubj.EmployeeCode
REPLACE TSubjs.EmployeeName WITH AdmTeachSubj.EmployeeName
ENDIF
SELECT TSubjs
ENDSCAN
USE IN SySubjects
USE IN AdmTeachSubj
4. Step 4: Create a temporary cursor into which you will then read-back
the records from the Excel spreadsheet. The versatile CREATE CURSOR
statement achieves this because it allows you to define the exact fields
to be contained in the temporary table as well as their attributes. Once
you have created the temporary table, you can perform all types of
operations with it such as APPEND (add new records), DELETE
(remove existing records), LOCATE (Search for and match
existing records) and so on. We have used a CREATE CURSOR
instead of a CREATE TABLE because Visual FoxPro removes the
temporary table once you have finished using it – beautiful is it not?
This was created with the following commands:
* Now Begin the Actual Process of reading the data from the array into
the system
* 1) Create a Cursor to hold the records you are reading back
CREATE CURSOR TStudMarks(BatchCode c(20),StudCode c(20),StudName c(20),;
Sex c(1),BirthDate D,Age I,Grade I,ClassCode c(1),AcdStartYear
I,AcdEndYear I,Term I,;
CourseCode c(10),CourseName c(25),Employeecode c(15),EmployeeName
c(50),TotPossMarks N(3,2) DEFAULT 100,Curricular L,;
ExamMark N(3,2) DEFAULT 0,ExamAvg N(3,2),Behaviour c(1),NoOfStuds
I,RankInClass I DEFAULT 0,GradeCode c(2),;
NoteCode c(2),PromStatus c(10),DaysAbs I DEFAULT 0)
6. Step 6: Read the records from the Excel Spreadsheet and update the
TStudMarks Cursor created earlier. Once you have populated this table,
you can update the Students master marks list table. This was
achieved with the following piece of code:
The subject headers are in the cursor TSubjs. TSubjs contains all
entries on Row 4 beginning from Column 6 to column 27. If you
examine the code that created and populated the cursor, you will
notice that the code does not discriminate between an actual subject
header or what is a teacher remark (though it out to). This means that
your TSubjs cursor looks like this:
RecNo CourseCode
1 Amharic
2 T Remark
3 English
4 T Remark
5 Math
6 T Remark
7 Biology
8 T Remark
9 Chemistry
10 T Remark
11 Physics
12 T Remark
13 History
14 T Remark
15 Geography
16 T Remark
17 Computer
18 T Remark
19 Civics
20 T Remark
21 H & PE
22 T Remark
This table shows that each subject has two entries! The subject itself
and its teacher remark (T Remark) code. In the above piece of code,
we use the modulus (% OR MOD) function that returns a remainder. So
if the value of intCols % 2 is zero (0) then we need to add one to the
value returned to obtain the right subject header in the TSubjs cursor
else we shall use the value returned as it is. The statement GO
intRecNumb moves us to the correct row in the TSubjs cursor so that
the correct data is retrieved and stored in memory variables such as
cCourseCode, cCourseName, etc for easy usage,
The next statement will process all rows between columns 6 and 28
(intCols >=6 AND intCols <= 28). It will read back the score for each
subject from the Excel spreadsheet, storing the value obtained in the
memory variable nScore and the exam ranking for each student,
storing it in the memory variable intRankInClass and so on!
Once the relevant information has been read back, the lines SELECT
TStudMarks, APPEND BLANK and those that follow it will populate the
TStudMarks cursor. This is the first step – populating the TStudMarks
cursor with the records found in the Excel Spreadsheet. The next step
shall be to populate TStudMarks cursor with the correct ID’s and Birth
Dates from the TStuds cursor that contains a list of students drawn
from the Students Master List table AcdStudents. Step 7 covers how to
perform this action,
7. Step 7: Obtain the Student IDs from the Students Master Table. As you
will notice, the Excel spreadsheet does not have a column for Student
ID! So we need to use phonetic matching (names that sound the same)
to match the names obtained from the spreadsheet to the names
contained in the Students Master table for the students of the specified
grade and class. Phonetic Matching helps us achieve this by using the
DIFFERENCE command that returns a number between 1 to 4 to
indicate the level of similarity of pronunciation between two words. The
higher the number returned, the greater the similarity in
pronunciation. This was achieved using the following code:
* Now process the students record by matching first and last names
THISFORM.lblStatus.Caption = "Now matching names and Student
IDS...Please wait!"
8. Step 8: Update the Students Marks List table on your master database
with the details that you have collected into the temporary cursor
TStudMarks. You can do this with the following piece of code:
* Now that we have matched the ID's, we need to update the
* For each record in the cursot, find the matching record
* in the AcdStudMark table. if the record exists then a mark
* entry already exists...make changes to it else just create it
THISFORM.lblStatus.Caption = "Now preparing to update master marks
list...Please wait!"
cBatchCode = ""
IF NOT USED("AcdStudMarks")
USE AcdStudMarks IN 0
ENDIF
SELECT TStudMarks
GO TOP
SCAN
cMsg = "Now updating " + TStudMarks.StudName + " - " +
TStudMarks.courseCode + " Term " + ALLTRIM(STR(TStudMArks.Term))
THISFORM.lblstatus.Caption = cMsg
SELECT AcdStudMarks
GO TOP
LOCATE FOR ALLTRIM(AcdStudMarks.StudentCode) =
ALLTRIM(TStudMarks.StudCode) ;
AND ALLTRIM(AcdStudMarks.CourseCode) =
ALLTRIM(TStudMArks.CourseCode) ;
AND AcdStudMarks.Grade = TStudMArks.Grade ;
AND AcdStudMarks.Class = TStudMArks.ClassCode;
AND AcdStudMarks.AcdStartYear = TStudMarks.AcdStartYear ;
AND AcdStudMarks.AcdEndYear = TStudMarks.AcdEndYear;
AND AcdStudMarks.Term = TStudMarks.Term
IF NOT FOUND() && Record does not exist in the Marks Entry
table...create it
STORE "" TO cParamCode,cParamVal2,cBatchCode
STORE 0 TO nParamVal1
ENDSCAN
Our Database:
If you want this little system to run then you will also have to reconstruct the
database. We created a Visual FoxPro Database (you could call it any name
and then added the following tables to it:
SyGradeCodes is the table that holds the list of school grades managed by
your school. SyGradeCodes is a part of your data environment.
SyTeacherNotes is a table that holds standard Teacher remark codes. These
should general contain the same codes as those entered on the T Remark
fields on the spreadsheet. These will be verified during the conversion. We
did not add this table to the data environment but instead opened it with a
USE Statement. The SySubjects table holds the list of subjects being taught
to students. When we read a subject name from the Excel spreadsheet, we
check this table to obtain its CourseCode, Total possible marks (TotPossMarks)
and whether or not it is a curricular subject (Curricular). We did not add this
table to our data environment but just used a Use statement as the need
arose.
These tables should be added to your form’s data environment. AcdStudents
is the Students Master Table. AcdStudMarks holds the students school grade
marks for every term. AdmTeachSubj is the teacher’s timetable, used to verify
the name and ID of the teacher taking a specified subject for the given term
(required information on a Report Card printed from the system). We also
have the following tables:
The sample spreadsheet from which the data was imported looks like this:
Conclusion:
After building the form shown above, simply drop a command button onto the
form and then copy and paste all of the code examples shown in this article.
Of course, you would need to build the associated tables as well.
The objective of this article has been to demonstrate the integration and
conversion of existing date in Microsoft Excel Spreadsheets into tables in a
Microsoft Visual FoxPro database where exact matching keys don t exist with
a fair degree of accuracy by using the DIFFERENCE function provided by VFP.
We recognize that the richness of the Visual FoxPro Language means that this
could be achieved in many different ways; yet this little write-up contributes a
little to the literature on this subject. User friendliness can be employed with
this sample in many ways. For example, you may want to give your users the
ability to define the structure of their master marks list excel spreadsheet
(after all the format may not remain the same for ever) and the format may
differ from one school to the other) and so on.
The Visual FoxPro Programming language is rich and its database engine very
powerful thus allowing you to build the most robust applications quickly. The
phonetic matching capabilities using the DIFFERENCE function provides a
powerful avenue to integrate existing software and convert existing data from
those formats (in this example, Microsoft Excel) into the application system.
Careful selection of the commands and functions to use can allow you to
build the most powerful and fastest ‘pure fox’ applications.