Friday, July 29, 2005

test #3

java -Xmx1024m -classpath /home/russ/.maven/repository/junit/jars/junit-3.8.1.jar:/home/russ//.maven/repository/ims/
jars/imsobjects-1.2.0-SNAPSHOT.jar:/home/russ/.maven/repository/db4o/jars/db4o-4.5-java1.4.jar:. junit.textui.TestRunner org.lds.im
s.objects.TestDb4o
.Loading database with: 1000000 ids(2x), names, people

502640ms elapsed for task
Looking up Person by id 113322
1ms elapsed for task
Looking up Name from Person
[db4o 4.5.009 2005-07-29 10:48:45]
Uncaught Exception. Engine closed.
[db4o 4.5.009 2005-07-29 10:48:45]
Please mail the following to info@db4o.com:

java.lang.OutOfMemoryError

Closing database
39ms elapsed for task
E
Time: 569.755
There was 1 error:
1) testLookupByRfn(org.lds.ims.objects.TestDb4o)java.lang.RuntimeException: Uncaught Exception. db4o engine closed.
at com.db4o.YapStream.fatalException(Unknown Source)
at com.db4o.YapStream.get1(Unknown Source)
at com.db4o.YapStream.get(Unknown Source)
at org.lds.ims.objects.TestDb4o.testLookupByRfn(TestDb4o.java:152)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

FAILURES!!!
Tests run: 1, Failures: 0, Errors: 1

------
public void setUp(){
Db4o.configure().objectClass(TId.class).objectField("idValue").indexed(true);
Db4o.configure().objectClass(TName.class).objectField("nameValue").indexed(true);
Db4o.configure().objectClass(TName.class).objectField("person").indexed(true);
Db4o.configure().objectClass(TBasicPerson.class).objectField("imsId").indexed(true);


database = Db4o.openFile(DB_FILENAME);
setupDatabaseWithPeople(NUM_OF_PEOPLE_TO_LOAD);
}
-----

---loading code---
IMSID ims = new IMSID("" + (peopleCntr * 17));

TBasicPerson newDude = new TBasicPerson();
newDude.setImsId(ims);
database.set(newDude);

TId imsId = new TId();
imsId.setIdType(IdType.IMS_ID);
imsId.setIdValue(ims.toString());
imsId.setPerson(newDude);

database.set(imsId);

TId rfn = new TId();
rfn.setIdType(IdType.RFN);
rfn.setIdValue("" + (peopleCntr * 31));
rfn.setPerson(newDude);

database.set(rfn);

TName name = new TName();
name.setNameValue("Some Test dude name" + (peopleCntr * 19));
name.setNameType(NameType.CMIS_NAME);
name.setPerson(newDude);

database.set(name);
---end loading code---

---tests---
TId queryId = new TId();
queryId.setIdValue("113322");

Query query = database.query();
query.constrain(TId.class);

ObjectSet results = database.get(queryId);

beginLoggedEvent("Looking up Person by id " + queryId.getIdValue());
assertTrue(results.size() == 1);
TBasicPerson person = ((TId)results.next()).getPerson();
endLoggedEvent();

beginLoggedEvent("Looking up Name from Person");
TName queryName = new TName();
queryName.setPerson(person);
results = database.get(queryName);
assertTrue(results.size() == 1);
System.out.println("Name is: " + ((TName)results.next()).getNameValue());
endLoggedEvent();
---end tests---

Thursday, July 28, 2005

db4o second test

Using a BasicPerson, TestId and TestName pojos:
Loading database with: 1000000 ids(2x), names, people
just reached: 0 in 0
just reached: 10000 in 5235
just reached: 20000 in 9609
just reached: 30000 in 14014
just reached: 40000 in 18555
just reached: 50000 in 24294
just reached: 60000 in 29458
just reached: 70000 in 34711
just reached: 80000 in 38898
just reached: 90000 in 43103
just reached: 100000 in 48126
just reached: 110000 in 52073
just reached: 120000 in 56077
just reached: 130000 in 60128
just reached: 140000 in 64137
just reached: 150000 in 69626
just reached: 160000 in 73657
just reached: 170000 in 77673
just reached: 180000 in 81744
just reached: 190000 in 85833
just reached: 200000 in 89924
just reached: 210000 in 93986
just reached: 220000 in 98120
just reached: 230000 in 102249
just reached: 240000 in 108154
just reached: 250000 in 112670
just reached: 260000 in 117505
just reached: 270000 in 122052
just reached: 280000 in 126539
just reached: 290000 in 131076
just reached: 300000 in 136139
just reached: 320000 in 146089
just reached: 330000 in 150771
just reached: 340000 in 155347
just reached: 350000 in 159983
just reached: 360000 in 164586
just reached: 370000 in 169100
just reached: 380000 in 176931
just reached: 390000 in 181569
just reached: 400000 in 186232
just reached: 410000 in 190950
just reached: 420000 in 195867
just reached: 430000 in 200507
just reached: 440000 in 205159
just reached: 450000 in 209983
just reached: 460000 in 214668
just reached: 470000 in 219328
just reached: 480000 in 224170
just reached: 490000 in 228886
just reached: 500000 in 233544
just reached: 510000 in 238384
just reached: 520000 in 243180
just reached: 530000 in 247650
just reached: 540000 in 252316
just reached: 550000 in 256820
just reached: 560000 in 261308
just reached: 570000 in 266162
just reached: 580000 in 270585
just reached: 590000 in 275104
just reached: 600000 in 279884
just reached: 610000 in 289737
just reached: 620000 in 294251
just reached: 630000 in 299066
just reached: 640000 in 303539
just reached: 650000 in 308344
just reached: 660000 in 312893
just reached: 670000 in 317665
just reached: 680000 in 322080
just reached: 690000 in 326708
just reached: 700000 in 331115
just reached: 710000 in 335714
just reached: 720000 in 340175
just reached: 730000 in 344840
just reached: 740000 in 349324
just reached: 750000 in 353799
just reached: 760000 in 358612
just reached: 770000 in 363367
just reached: 780000 in 367962
just reached: 790000 in 372655
just reached: 800000 in 377622
just reached: 810000 in 382321
just reached: 820000 in 387158
just reached: 830000 in 391964
just reached: 840000 in 396887
just reached: 850000 in 401993
just reached: 860000 in 406987
just reached: 870000 in 411791
just reached: 880000 in 416510
just reached: 890000 in 421543
just reached: 900000 in 426205
just reached: 910000 in 431058
just reached: 920000 in 435586
just reached: 930000 in 440384
just reached: 940000 in 455117
just reached: 950000 in 459988
just reached: 960000 in 464647
just reached: 970000 in 469666
just reached: 980000 in 474506
just reached: 990000 in 486817
499153ms elapsed for task
Closing database
Total objects added: 2mil Ids + 1mil names + 1mil persons = 4mil objects

Next to test lookup on FKs.

Friday, July 22, 2005

db4o intial testing

Update
Thanks to Carl (and a bit of more reading of the tutorial) I updated my test to index on a single field--and then queried on that field. New results:

Million records took: 128966 ms
Fetching a record from a million: 5 ms
Deleting a million: 132258 ms

*Very* impressive. Perhaps I'll get more creative with a larger POJO and evaluate further metrics. The numbers for creating and deleting records are larger due to more processes running at the time of the test.
---------------

Dell D800 1.7 PM, 2G ram, JVM 1.4.2.06, from maven using JUnit

Dumped in a million objects (4 field POJO) and the db file grew to 64MB.

Million records took: 48195 ms
Fetching a record (QBE) from a million: 12460 ms
Deleting a million, but result set from (by Query): 48234 ms





Thursday, July 21, 2005

Importance of UTC date normalization in event triggers.

One important aspect of internationalizing an application may be that of date handling. That is, how will the application read and write dates. Should the scope of the project be limited to viewing only, then the point is moot. However, if the need arises that these dates drive some activity or event then "normalizing" the dates should be heavily considered. Choosing an international standard (UTC) and conforming date entries allows date-specific operations and triggers to be precisely executed.

Consider the following example. An organization has dates that drive specific events or triggers. A monitor/daemon is run that periodically evaluates these dates and fires the appropriate action. Presume the monitor awakes at midnight MST. Then the scenario in which an organization in New Zealand depends on the date of 24 July to trigger some action will then be delayed for 33 hours--and the people/systems dependent upon those date-driven operations will be hosed. Here's how:

    00:00 Denver July 23 the monitor awakes. Monitor evaluates all dates
    for 23 July and invokes appropriate operations.
    However, the New Zealand organization will not be processed yet,
    awaiting for midnight Denver on July 24.
    BUT, 00:00 Denver July 23 is 19:00 NZ July 24. The time difference
    is 19 hours. So, the New Zealand organization must wait until 00:00
    Denver July 24. So, 19h + 24h = 43h before the effective date
    processing occurs.

Alternatively, if the effective date is stored as a timestamp (even though only the "date" is really relevant considering the requirement to run operations based on "that day") and the monitor runs hourly the problem is solved. The New Zealand "date" of 00:00 July 24 is really 12:00 July 23 UTC. If the monitor then only evaluates per-hour dates, the application can then address international needs required by the effective date handling. Thus, all triggers and date dependent operations will only ever be delayed by an hour and ensuring that at midnight for any given timezone the correct operations for that timezone's day will be executed.

Thursday, July 07, 2005

more file procecssing stats

So I can't get over the fact that I want to write the GUI in Swing. I had the idea to use Ruby on the back end to get the list of files, then write the whole list to a file. Then use Java to pickup the file, parse it, search (per user input) for the file and use Swing as the GUI! All, of course, within a Ruby script. Sound impossible? Check it out:


time ruby finderPrinter.rb
17370

real 0m0.598s
user 0m0.427s
sys 0m0.164s

So that's 17370 files, written to disk in .598 seconds! Beautiful! File is ~1.5 meg. How about how fast Java can process the file?


java -classpath . JReader
17370
took 169 milliseconds
Looked for 'a' in all files: 17258 169 milliseconds


Incredible. Here's the java code:

public void readFile(String fileName) throws FileNotFoundException{
//see how long it takes to read in the file
//iterate the list
//then search through the list for a given name
long timeStarted = System.currentTimeMillis();
BufferedReader reader = new BufferedReader(new FileReader(fileName));
String currentLine = null;
try{
while(reader.ready()){
currentLine = reader.readLine();
fileList.add(currentLine);
}
reader.close();
}
catch(IOException ex){
ex.printStackTrace();
}

for(Iterator fileIter = fileList.iterator();fileIter.hasNext();){
String filePath = (String)fileIter.next();
if(filePath.indexOf("a") > 0){
resultList.add(filePath);
}
}
System.out.println(fileList.size());
System.out.println("took " + ((System.currentTimeMillis() - timeStarted)) + " milliseconds");
System.out.println("Looked for 'a' in all files: " + resultList.size() + " " + ((System.currentTimeMillis() - timeStarted)) + "milliseconds");
}


Pretty simple, but that's exactly what I want. The two Collections (fileList and resultList) are ArrayLists. If it operates this quickly, it *should* be acceptable to get a GUI in place that can execute with these ideas. Real kicker is to run the Ruby script that then kicks off the java code...


Finding files
found
17373
done writing file, now kicking off java process
17373
took 143 milliseconds
Looked for 'a' in all files: 17260 143milliseconds

real 0m0.895s
user 0m0.630s
sys 0m0.211s

Acceptable? Just have to code the gui to find out.

Wednesday, July 06, 2005

file fetching stats

I ported the ruby code to python and java for the file finder app. I'd love to use Swing to get out of the GTK pain, but the obesity of the JDK is going to hinder that route.

Here are the results:

time python test.py
18184
--------------------
real 0m0.649s
user 0m0.465s
sys 0m0.182s

time ruby test.rb
19948
--------------------
real 0m0.598s
user 0m0.383s
sys 0m0.195s
--------------------
java -classpath . SimpleTester
19946 took 37 seconds


Couple of interesting things to note:

First, the code running each of these tests is identical in nature--it's a simple recursive walk of a directory tree. The Pyton code came up with signifcantly less (~1800) files because it ignores symlinks. Wonder what two files Java didn't account for that Ruby did.

Second, the Java collection used in the test is a HashSet containing the file names (Strings). Changing the implementation to use an ArrayList of Files added a second or two onto the test.

I should write this also in C++ and post results. For now Ruby is the clear winner and my hunch held true that it's the slimmest and fastest of the three.

Test specs:
Dell D800 P4M 1700mhz
2G Ram
reiserfs
Linux rr800 2.6.11-gentoo-r4 #5 Tue May 3 08:32:04 MDT 2005 i686 Intel(R) Pentium(R) M processor 1700MHz GenuineIntel GNU/Linux