[soci-users] unsupported BLOB vector support @ ODBC

Discussion:

h***@code-styling.de

2013-06-25 15:58:54 UTC

Beside the implementation of MSSQL unicode support over std::string I stumbled
over the unsupported BLOB capabilities of SOCI in terms of ODBC.
We would use SOCI as replacement ina 3.2M codelines middleware project but we
have to check carefully all capabilities upfront.
SOCI is still within the scope (beside other libraries) but after solving partly
MS unicode things we have to deal with blobs too.
It's not possible to fetch them over ODBC by vectors, any other way is not
suiteable because of performance.

So I did some tests to provide BLOB reading/writing using vectors and it seems,
that it can be handled same straight forward than currently done with
std::wstring (but with special handling to buffer lazy allocation after knowing
the real size per BLOB).
In the MSSQL world I can handle the BLOB field with SQL_C_BINARY but I'm afraid,
this is only half the story.
I would prefere some more input in terms of BLOB utilization at other systems
using ODBC prior to define it fix for a 2nd additional base type: x_binary
(beside the existing x_blob) with a more straight forward class wrapping an
unsigned char vector as content holder.

I haven't looked yet though the other backends (beside ODBC) but I think, it
could be handled there more or less similar at the end.

I appreciate any input for BLOB handling to avoid getting a small tunnel view
:-)

regards

Heiko

h***@code-styling.de

2013-06-26 13:32:39 UTC

Permalink

Currently I have found a possible solution for ODCB layer.
But it has a limitation while bulk fetching blob data. Any known C++ bulk blob
select I have seen so far works only with fix sizes of blobs.
Thats the only way to bind it into vectors successfully.

example APP code from my local working version:

records = 0;
int rows_per_fetch = 50;

vector<int> primary(rows_per_fetch);
vector<int> sequence(rows_per_fetch);
vector<string> name(rows_per_fetch);
vector<binarydata> content(rows_per_fetch, 32000); //32000 bytes per blob =
1.52 MB in total for blob buffer

vector<indicator> ind(rows_per_fetch);

statement st2 = (sql.prepare << "select rc_sequence_id, sequence, name,
content from rc_sequence order by rc_sequence_id, sequence DESC",
into(primary),
into(sequence),
into(name,ind),
into(content)
);
st2.execute();
while (st2.fetch())
{
//because last roundtrip may have shrinked the vector(s)
for (size_t i=0; i<primary.size(); i++)
{
std::string txt(content[i].begin(), content[i].end());
printf("%d|%d|%s\n", primary[i], sequence[i], name[i].c_str());
printf("--------------------------------------------\n");
printf("%s\n", txt.c_str());
printf("--------------------------------------------\n");
}
records += primary.size();
}

The new used class soci::binarydata inherits public from std::vector<unsigned
char> an can be used like an ordinary vector of bytes.
It also permits the definition of fix sized blobs by utilize the resize (fill)
constructor. Internally the buffer will be defined using the vector items sizes.

For simple blob reads this should deliver the full size of blob without such
limitation. (fetching row by row, complete blob size for very large blobs).

It also avoids to put the session into the blob constructor as the current
backend class does.
I will have to check, if it's possible to do this for all scenarios/backends and
managing the special blob handling (oracle etc.) behind the scenes and not at
construction.
In my opinion no data represention object (string, blob, integer etc.) have to
know its session, this is not the right place of responsibility.

I know, that this bulk usage may truncate very large blobs. But if the DB / APP
systems works with max limited BLOB's in sequences (like we do), it's suitable
to
speedup fetching using fixed size blob reads.

Any response welcome, fork of 3.2.1 version follows soon with my modifications
(directly inside ODBC layer for now).

regards

Heiko

Post by h***@code-styling.de
Beside the implementation of MSSQL unicode support over std::string I
stumbled over the unsupported BLOB capabilities of SOCI in terms of ODBC.
We would use SOCI as replacement ina 3.2M codelines middleware project but we
have to check carefully all capabilities upfront.
SOCI is still within the scope (beside other libraries) but after solving
partly MS unicode things we have to deal with blobs too.
It's not possible to fetch them over ODBC by vectors, any other way is not
suiteable because of performance.
So I did some tests to provide BLOB reading/writing using vectors and it
seems, that it can be handled same straight forward than currently done with
std::wstring (but with special handling to buffer lazy allocation after
knowing the real size per BLOB).
In the MSSQL world I can handle the BLOB field with SQL_C_BINARY but I'm
afraid, this is only half the story.
I would prefere some more input in terms of BLOB utilization at other systems
using ODBC prior to define it fix for a 2nd additional base type: x_binary
(beside the existing x_blob) with a more straight forward class wrapping an
unsigned char vector as content holder.
I haven't looked yet though the other backends (beside ODBC) but I think, it
could be handled there more or less similar at the end.
I appreciate any input for BLOB handling to avoid getting a small tunnel view
:-)
regards
Heiko

Mateusz Loskot

2013-06-26 21:12:39 UTC

Permalink

Post by h***@code-styling.de
The new used class soci::binarydata inherits public from
std::vector<unsigned char> an can be used like an ordinary vector of bytes.

Similar idea has been discussed, as I mentioned in my previous reply,
you can see some outcome and links to prototypes here

http://soci.6940.n7.nabble.com/soci-users-MySQL-Query-select-round-111100237735-42999-2-td3229.html#a3262

I'm in favour what Aleksander suggested: std::string.

Best regards,

--
Mateusz Loskot, http://mateusz.loskot.net

Mateusz Loskot

2013-06-26 21:10:27 UTC

Permalink

Perhaps, soci::blob would be handy for ODBC, not sure though.

Post by h***@code-styling.de
I would prefere some more input in terms of BLOB utilization at other
x_binary (beside the existing x_blob) with a more straight forward class
wrapping an unsigned char vector as content holder.
I haven't looked yet though the other backends (beside ODBC) but I think, it
could be handled there more or less similar at the end.
I appreciate any input for BLOB handling to avoid getting a small tunnel
view :-)

There have been some discussions about how to support binary streams,
and all discussions led to conclusion that we don't need any additional
container type, just stick to std::string.

You may find these two threads useful:

http://soci.6940.n7.nabble.com/SOCI-users-BLOB-for-mysql-backend-td1247.html

http://soci.6940.n7.nabble.com/soci-users-MySQL-Query-select-round-111100237735-42999-2-td3229.html

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net

Mateusz Loskot

2013-06-26 21:28:29 UTC

Permalink

Post by Mateusz Loskot

Perhaps, soci::blob would be handy for ODBC, not sure though.

There have been some discussions about how to support binary streams,
and all discussions led to conclusion that we don't need any additional
container type, just stick to std::string.
http://soci.6940.n7.nabble.com/SOCI-users-BLOB-for-mysql-backend-td1247.html
http://soci.6940.n7.nabble.com/soci-users-MySQL-Query-select-round-111100237735-42999-2-td3229.html

Here is another one, where some original decisions background is
explained a bit:

http://thread.gmane.org/gmane.comp.db.soci.user/909

Best regards,

--
Mateusz Loskot, http://mateusz.loskot.net