scottmk [Tue, 6 Jan 2009 20:55:03 +0000 (20:55 +0000)]
1. Make the functions reset_session_buffers() and grab_incoming()
static, since no other source file references them by name.
2. Delete various fragments of dead, obsolete, commented-out code.
3. In init_transport: delete the cleanup code after failure
of previous calls to buffer_init(). It's unreachable, since
buffer_init() can never return NULL.
4. Change transport_session.c so that it uses buffer_add_n()
instead of passing data through an intermediate buffer.
scottmk [Tue, 6 Jan 2009 16:01:41 +0000 (16:01 +0000)]
Tinkering with macros.
1. In OSRF_BUFFER_ADD, OSRF_BUFFER_ADD_CHAR, and OSRF_BUFFER_RESET:
eliminated multiple evaluations of macro arguments.
2. In OSRF_BUFFER_ADD: renamed local variable __tl to _tl, since
identifiers beginning with two underscores are reserved.
3. In OSRF_BUFFER_RESET: applied the do/while(0) trick so that the
macro will be work as intended when subject to an "if".
4. Added new macro OSRF_BUFFER_C_STR to return a const pointer to
the internal buffer of a growing_buffer. This new macro will enable
safe and direct access to the buffer contents without violating
encapsulation and without incurring a malloc and free.
scottmk [Mon, 5 Jan 2009 17:50:15 +0000 (17:50 +0000)]
This update boosts the performance of the jsonFormatString function.
1. Replaced the old _tabs function, which required the construction and
destruction of a growing_buffer, with a new append_indentation function,
which adds white space to an existing growing_buffer. This change
eliminates a passel of mallocs and frees.
2. Removed the call to strlen() from the loop condition.
3. Replaced calls to buffer_fadd(), a fairly slow function, with calls
to OSRF_BUFFER_ADD_CHAR() and append_indentation(). Also: replaced a
call to buffer_add_char with the corresponding macro.
4. Eliminated a harmless but wasteful bug that sometimes added
indentation to the end of a line.
In my benchmarking, using a moderately complex JSON string 201
characters long, the new version was seven times as fast as the old.
scottmk [Mon, 5 Jan 2009 17:36:42 +0000 (17:36 +0000)]
This update restructures the mechanism for queueing incoming transport
messages. In addition, the update to transport_client.c rearranges the
logic a bit in client_recv().
1. A transport_message now carries a pointer to be used in a linked list.
It is initialized to NULL when the message is created. We no longer use
a separately allocated list node to carry the message.
2. The queue of transport_messages no longer starts with a dummy node.
3. Instead of finding the tail of the queue by traversing the list from
the head, we maintain a separate pointer to the tail node. Thus the
enqueuing operation occurs in constant time instead of linear time.
4. In client_recv: we now have the dequeueing code in a single place,
instead of duplicating it.
5. In client_recv: I eliminated some conditional compilation that made
no real difference, since both branches of the #ifdef were effectively
identical.
6. In client_recv: changed both loops from while loops to do-while
loops, since in each case we want to perform at least one iteration.
scottmk [Mon, 5 Jan 2009 17:05:45 +0000 (17:05 +0000)]
1. In osrf_stack_transport_handler(): removed the memset() as
pointless.
2. Also in osrf_stack_transport_handler(), in the loop traversing
arr[]: changed the loop condition from "i != num_msgs" to
"i < num_msgs", for hygienic reasons.
3. Eliminated osrf_stack_message_handler(). The first half of it moved
into its caller. The second half moved into the two callees
_do_client() and _do_server(). This refactoring made it easier to
eliminate a memory leak where _do_server() was failing to free the
input osrfMessage. I also eliminated a bug whereby we potentially
tried to access a member of a freed osrfMessage.
4. osrf_stack_application_handler() now returns void instead of int,
since we were ignoring the return value anyway.
scottmk [Mon, 5 Jan 2009 14:27:39 +0000 (14:27 +0000)]
Add a new function buffer_add_n(), and the corresponding
function-like macro OSRF_BUFFER_ADD_N.
These facilities append a specified number of characters from
an input string to a growing_buffer. They are intended for
situations where the length of the input string is already known,
or where the input string is not nul-terminated in the right
place.
erickson [Mon, 22 Dec 2008 16:12:39 +0000 (16:12 +0000)]
running in no-router/single-service mode was checking a config value that is not required to exist. if we still want to support running in no-router mode, let's make it an explicit confuration option
erickson [Thu, 18 Dec 2008 20:17:57 +0000 (20:17 +0000)]
correctly capture the JID of the backend server process for the session cache. only create a session cache if there is a CONNECT message in the batch. be more aggressive about removing session caches
miker [Sat, 6 Dec 2008 02:28:54 +0000 (02:28 +0000)]
Patch from Scott McKellar:
These patches eliminate several deprecated identifiers in favor of their
camel case equivalents. All these identifiers have already been
eliminated from elsewhere in the source tree:
With these patches I complete my pet project of eliminating redundant and
deprecated identifiers (unless of course I discover others that I hadn't
noticed before).
miker [Sat, 6 Dec 2008 02:24:53 +0000 (02:24 +0000)]
Patch from Scott McKellar:
This patch eliminates the deprecated typedef osrf_message, replacing it
with the camel case equivalent osrfMessage. All other occurrences have
slready been eliminated from the source tree.
miker [Fri, 5 Dec 2008 20:02:35 +0000 (20:02 +0000)]
Patch from Scott McKellar:
1. Move the declaration of osrf_app_request_struct, and its typedef as
osrfAppRequest, out of the header and into osrf_app_session.c.
2. In the declaration of osrf_app_session_struct: remove an obsolete
and commented-out declaration of request_queue.
3. Abolished _osrf_app_session_free(), moving its contents into
osrfAppSessionFree().
4. In osrfAppSessionCleanup(): after freeing the cache, nullify the
pointer to it, in the interests of good hygiene.
5. In _osrf_app_request_free(): free the messages in the result queue.
6. In _osrf_app_request_recv(): Eliminated the useless intermediate
variables tmp_msg used for dequeuing messages.
7. In osrf_app_session_set_locale: If the existing session_locale is
big enough to hold the new locale, use strcpy() instead of free() and
strdup().
8. In osrf_app_session_set_remote: If the existing remote_id is big
enough to hold the new remote_id, use strcpy() instead of free() and
strdup().
9. To eliminate some duplication of code, call
osrf_app_session_set_locale() amd osrf_app_session_set_remote() in
_osrf_app_request_recv() and osrf_app_session_reset_remote(),
respectively.
10. Performance tweak: in osrfAppRequestRespondComplete: don't create
the payload message unless we're actually going to use it.
11. Make osrfAppSessionCache static, since no other source file
references it.
dbs [Sun, 30 Nov 2008 05:30:44 +0000 (05:30 +0000)]
Merge patch from Scott McKellar for better Unicode handling
The attached files contain a drop-in replacement for the
buffer_append_uescape function that I submitted a few days ago. I regard
this new one as experimental, at least for now.
They also offer some byte-testing functions, and the equivalent macros,
that may be useful in other code that deals with UTF-8 strings.
The new function buffer_append_utf8() differs from buffer_append_uescape()
in the following ways:
1. It treats 0xF7 as a control character, which it is.
2. It is more finicky about recognizing the header byte of multibyte
characters. For example 0xF6 is not a valid UTF-8 header byte.
3. When it sees a nul byte in the middle of a multibyte character, it
stops. In the same situation, the older buffer_append_uescape() and
uescape() functions accumulate the nul byte into the hex codes they
build and then keep going, risking not only misbehavior but undefined
behavior.
4. When it finds invalid UTF-8 characters in the input string, it skips
over the invalid UTF-8 until it finds a valid character, and then
continues to translate the rest. In other words it excises the garbage
and translates the rest intact.
---------
The file osrf_utf8.c includes an array of bitmasks that it uses to look
up the characteristics of each byte. Not trusting myself to do that
much tedious typing by hand, I wrote a program to write the list of
bitmasks. The macros are broadly similar to the standard C functions
isprint(), isalpha(), and so forth.
There is also a collection of functions, equivalent to the macros, with
the same names except using double underscores. These may never find a
use, but they're there in case anyone ever needs a function pointer for
some reason.
The logic uses a finite state machine (FSM) to examine and dispatch each
byte in the input stream. Because it needs to branch on the current
state as well as the type of each character, this logic is a little
slower than buffer_append_uescape(). However pretty much any
implementation of the same behavior would probably incur some such extra
overhead in some form.
-------------
(Regarding the second version of osrf_utf8.c):
When it encounters a code point too big to fit into 16 bits (after
stripping out the packaging bits), it formats it into a surrogate pair
of four hex digits each, rather than a single set of five or six hex
digits.
In addition, this new version no longer uses buffer_fadd() to format
hex values.
The code for constructing surrogate pairs is a slightly simplified version
of a code snippet found at:
http://www.unicode.org/faq/utf_bom.html
The code snippet seems to come from a pretty authoritative source. and
my modifications were minimal, consisting mostly of collecting a couple
of constant expressions into constant values.
In the case of the G clef character (U+1D11E), I verified that my code
translates it to the correct surrogate pair ("\uD834\uDD1E").
Scott McKellar
http://home.swbell.net/mck9/ct/
Developer's Certificate of Origin 1.1 By making a contribution to
this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license indicated
in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source license
and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same open source license (unless I am permitted to submit under a
different license), as indicated in the file; or
(c) The contribution was provided directly to me by some other person
who certified (a), (b) or (c) and I have not modified it; and
(d) In the case of each of (a), (b), or (c), I understand and agree
that this project and the contribution are public and that a record
of the contribution (including all personal information I submit
with it, including my sign-off) is maintained indefinitely and may
be redistributed consistent with this project or the open source
license indicated in the file.
erickson [Wed, 26 Nov 2008 17:56:42 +0000 (17:56 +0000)]
Two patches from Scott McKellar:
In some earlier patches I eliminated the old osrf_app_client_session_init
function in favor of its camel-case equivalent. Unfortunately I didn't
notice that the new function now puts the old function's name into an
error message. Here's the fix for that little oversight.
---
These patches replace a couple of deprecated identifiers with their
camel-case equivalents:
dbs [Tue, 25 Nov 2008 03:16:13 +0000 (03:16 +0000)]
Cheap hack to enable OpenSRF to build on recent glibc systems
(already in use in src/libopensrf/osrf_system.c)
Seems to stem from HOST_NAME_MAX moving from /usr/include/sys/param.h
to /usr/include/bits/local_lim.h
dbs [Thu, 20 Nov 2008 01:27:16 +0000 (01:27 +0000)]
Force Class::DBI install, as dependencies have started making test t/11 (triggers) fail;
We don't use trigger support in Class::DBI, so forcing the install should be okay.
erickson [Tue, 18 Nov 2008 22:23:29 +0000 (22:23 +0000)]
2 Patches from Scott McKellar, with slight modification:
This patch adds a new function buffer_append_uescape(), a streamlined
replacement for uescape(). The optimizations are as follows:
1. As described in an earlier post, the new function appends the uescaped
text to an existing growing_buffer, rather than into an internal one that
it must create and destroy. It thereby saves two mallocs and two frees.
In addition, the calling code doesn't have to do a buffer_add().
2. Because it doesn't create an internal growing_buffer, it doesn't need
to do a strlen() to determine how big a buffer to allocate.
3. Since the new version doesn't have a boolean parameter, it doesn't have
to test the boolean.
4. For characters < 128 (i.e. ASCII characters), I rearranged the order of
the tests so that non-control characters are recognized immediately. In
uescape() we first go through a switch/case looking for several specific
control characters like '\b' and '\n'. In practice most characters are
not control characters, so this rearrangement saves a few CPU cycles.
5. For control characters, uescape() uses buffer_fadd() to format the
character into hex. Now, buffer_fadd is slow because it uses vsnprintf()
twice, once to determine a buffer size and once to do the formatting. In
turn vsnprintf() is slow because it has to parse the format string. In
this case we don't need vsnprintf() because we already know exactly how
big the buffer needs to be, and the formatting is simple. I eliminated
buffer_fadd() and formatted the hex manually with a little bit-twiddling.
Some of these optimizations could be applied to uescape(), but I haven't
bothered, partly because I wanted a clean comparison for benchmarking
purposes and partly because I expect uescape() to disappear from use
(though I am leaving it available).
=====
This patch is a rewrite of the jsonObjectToJSON and jsonObjectToJSONRaw functions. It is dependent on my previous patch to utils.c and utils.h,
adding the new buffer_append_uescape function.
One purpose is to replace a call to the uescape function with a call to
the faster buffer_append_uescape function. The other purpose is to
introduce a faster way to translate a jsonObject into a string.
(Also in one spot I broke up a very long string literal into several
smaller pieces so that it wouldn't wrap around in the editor.)
In the existing jsonObjectToJSON function, we receive a pointer to a
jsonObject and return a string of JSON. However we don't translate the
original jsonObject directly. Instead, we create a modified clone of the
original, inserting an additional JSON_HASH node wherever we find a
classname. Then we translate the clone, and finally destroy it.
It always struck me as an egregious waste to create and destroy a whole
parallel object just so that we could turn it into a string. So I looked
for a way to eliminate the cloning.
The result is a modification of add_json_to_buffer(), a local function
that recursively traverses and translates the jsonObject. When it sees a
classname (and it has been asked to expand classnames), the new version
inserts additional gibberish into the output text and then continues the
traversal, without modifying or copying the original jsonObject at all.
In my benchmark, this new approach was faster than the original by a
factor of about 5. When I combined this change with the use of the new
buffer_append_uencode function, it was faster by a factor of about 7.2.
The benchmark used a moderately complex jsonObject about 5 or 6 levels
deep, with both hashes and arrays, with classnames at several levels.
The performance gain will no doubt depend on the contents of the
jsonObject,but I haven't tried to isolate the variables.
The new version is a bit trickier and harder to read than the old. In my
opinion the speedup is worth the obscurity, because a lot of places in
Evergreen will benefit.
erickson [Mon, 17 Nov 2008 03:17:50 +0000 (03:17 +0000)]
Patch from Scott McKellar:
This patch is mostly a couple of tweaks to the growing_buffer code, loosely
related to my previous patch to utils.h. There is also a small tweak to
uescape().
1. in buffer_add() I replaced strcat() with strcpy() for appending the new
string. Since we already know where the end of the old string is, we don't
need to ask strcat() to find it for us.
2. In buffer_reset(), the old code contains the following:
osrf_clearbuf( gb->buf, sizeof(gb->buf) );
The evident intent is to clear the buffer. However sizeof(gb->buf) is not
the size of the buffer, it's the size of the pointer to the buffer. We
were clearing only the first four bytes or so. I changed the line to:
osrf_clearbuf( gb->buf, gb->size );
3. Also in buffer_reset(), I added a line to populate the first byte of
the buffer with a nul, to ensure that the length of the (empty) string matches the n_used member.
4. In uescape(), we were examining the contents of string[] without first
verifying that string was not NULL. The result would be undefined
behavior if string were ever NULL. I added a couple of lines to treat
a NULL pointer as if it were a pointer to an empty string.
erickson [Mon, 17 Nov 2008 02:56:40 +0000 (02:56 +0000)]
Patch from Scott McKellar:
This patch fixes a bug in the OSRF_BUFFER_ADD_CHAR macro.
Like the corresponding buffer_add_char function, this macro appends a
specified character to a growing_buffer. Unlike the function, however, the
existing version of the macro does not also append a terminal nul.
This bug had gone unnoticed because, most of the time, the rest of the
buffer is already filled with nuls, left over from the initial creation of
the growing_buffer. I stumbled across the problem when, in the course of
writing a test harness for some other changes, I called buffer_reset()
in order to reuse an existing growing_buffer instead of destroying and
re-creating it.
With debugging turned on, buffer_reset() fills the buffer with exclamation
points, leaving a nul only in the very last byte. Later, if we use
buffer_add() or buffer_fadd() to extend the string stored in the
growing_buffer, it uses strcat() to append the new characters. The result
is a buffer overflow.
Actually buffer_reset() should place a nul in the first byte of the buffer.
Tomorrow I shall submit a patch to that effect.
dbs [Mon, 27 Oct 2008 05:07:06 +0000 (05:07 +0000)]
Clean up the source tree a little more:
* Delete setup.py.in (as we're not modifying it)
* Make math_client.py be modified with SYSCONFDIR location per other scripts
(although slightly longer term we'll need to stop modifying all of these
in place, because that doesn't work after the first ./configure run)
* Add a few files to automake's tracking so that make dist is a little happier
erickson [Fri, 24 Oct 2008 16:31:07 +0000 (16:31 +0000)]
the pool cleanup handler which was thought to only run on top-level child process exit is running on cloned processes cleanup. this is how mod_cgi runs scripts. disabling cleanup for now. note: this cleanup is new to 1.0
erickson [Mon, 13 Oct 2008 20:44:50 +0000 (20:44 +0000)]
io::socket::inet, somewhere between version 1.29 and 1.31, requires the peerport to be explicitly cast to an int. also updated error handling to use correct error var